Psychological Bulletin 


Harry HEtson, Editor 
University of Texas 


CONTENTS 


. Techniques for the Study of Group Structure and Behuvior: I1. Empiri- 
cal Studies of the Effects of Structure in Small Groups 
. Murray GLANZER AND ROBERT GiasER 1 


On Secondary Reinforcement atid Shock Termination. i Oe 
Ropers Breck 


Repeated \icasurements Designs and Counterbalancing.. . foun Gato 
Human Tracking Behavior ... Jack A, Apams 


The Matching Probiem with Multiple Judges and ae 
.. DonaLp w. ‘Pisa 


Berwarp I. Mursruin 


Comment on 





: 


Published Bimesitly by the 
American mcbicnes Association 


VoL, 58, No. 1 Ht JANvARY 1961" 

















Gonsditiie Kditors |= i 


) te, t: 
W. Bevan, Jr. W. H. Houtamanx he, 
Xoasas Sate Unegersty Unisarsity uf Texas aa 
R. R. Brake | aS PosTwAn - 
University of Texas . - * California, Berkelew 
W. R. Garner T. %. Rorcer 
Jokns Hopkihs Usiversi®y Ohio State Univer sity 
J. P. Gumrorp S. B. Szuts 
University of Southers Caisformia Texas Christion U ‘reer sity : 
W. A. Wrison, Jr. wees: 
Bryn Mawr College ea? 


The Psychological Bulletin contains evaluative reviews of research literature and 
reviews of research methodology and instrumentation in psychology. This Jowrna: 
does not publish reports of original research or original theor-tical articles. 

Manuscripts should be sent to the Editor, Harry Helson, Depareaeae of Psycho!- 
ogv, Kansas State University, Manhattan, Kansas. 

Preperation of articles for publication. Authors are strongly advice @ ‘foliow the 
general directions given in the Publication Manucl of the American grcal 
Association (1957 Revision). Special attention sh be given to meg fi on 
the pr eparation pf the references (pp. 50-60), since this is a partieular source of 





difficulty in long ceviews uf research literature. Al) copy must le spoced, 
including the references. All manuscripts should be submitted in gone of 
which should be an origiga!l typed copy; author's name should ly on title 
pee. Dicttoed and mimeographed copies are not acceptable frig be con- 


sijered. Original figures are prepared for publication; duplicate figures may be pho 
tographic or peacil-drawn copies. Authors ore cautioned to ~etain a copy of the 
maaguscript to guard against loss in the mail and to check cirefilly the typing of the 
firal « COP} 

Neprin nis. Fifty free reprints are given to contributors of articles and notes. 


— oa +; 





+ 
F 


Argrace C. Horrman Heian Omm 
Maxaging Editor Promotion Manager — 


Communications—including subsctiptions, orders of back issues, and changes of 
address—shoula be addressed to :he American Psychological Association, 1333 Six- 
teenth Street N.W., Washington 6, D.C. Adijress changes must reach the Subscrip- 
tion Office by the 10th of the month to tale effect the following month. Usdelivered 
copies resulting ffom address changes will not be replaced; swscribere notify 
the post office that they will guarantee second-class for ws rding | clairds 
for undelivered copies must be made within four months of pub 

Annual subscription: $10.00 (Foreign $10.50). Single copies, $2: 








PUBLISHED BIMONTHLY BY: 


THE AMERICAN PSYCHOLOGICAL ASSOC [AT J 
Menasha, Wisconsin “a aie 
and 1538 Sixteenth Street NW, Wastungron 6, D.C. Ps 


7 Ss 
Batercd as second monet Wuoshingtos, D.C., and Menasha, Wiscons: md-claes postage pain at 
‘i Washington, D.C. und st ndditicnel mailike offices. Printed we RURAL : 


Copyright by The /umerican Paychotogical Association, inc, 1961. — 











VoL. 58, No. 1 


Psychological Bulletin 


JANUARY 1961 





TECHNIQUES FOR THE STUDY OF GROUP 
STRUCTURE AND BEHAVIOR: 
II. EMPIRICAL STUDIES OF THE EFFECTS OF STRUCTURE 
IN SMALL GROUPS! 


MURRAY GLANZER? anp ROBERT GLASER 
American Institute for Research and University of Pittsburgh 


An earlier paper (Glanzer & Glaser, 
1959) reviewed techniques for analyz- 
ing the structure of groups that had 
been permitted to form their own 
pattern of interaction. This paper 
reviews laboratory studies in which 
experimenters imposed _ different 
structures on groups and measured 
the effect of the structures on per- 
formance. 

The laboratory studies focus on 
communication structure. A com- 
munication structure is a set of posi- 
tions with specified communication 
channels. Between any two posi- 
tions, there may be a two-way chan- 
nel, a one-way channel, or none at 
all. A channel is essentially the prob- 
ability that a message can pass in a 
given direction between two posi- 
tions. It may be defined more gen- 
erally as the probability, p., that a 


! Prepared under Contract Nonr 2551(00), 
between the Office of Naval Research, Psycho- 
logical Sciences Division, Personnel and 
Training Branch and the American Institute 
for Research, as part of a research project on 
tcam training and performance. The authors 
wish to thank Alex Baveias, Harold Guetzkow, 
Robert L. Hall, John T. Lanzetta, Seymour 
Rosenberg, Marvin E. Shaw, and Gerald 
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message can get from Position a to 
Position 6. This is not the probabil- 
ity that @ will try to send to ). It is, 
rather, the probability of his getting 
a message through if he tries to send 
one. In most of the structures 
studied, the channels are symmetric 
i.e., Pos=Pee and the channels are 
either available or not, i.e., pa,=0 
or 1. 

The studies are grouped in the 
following sections: The Initial Work, 
Variations and Further Analysis of 
the Basic Design, Testing the Limits 
of the Basic Design, Mathematical 
Analysis, Emphasis on Distribution 
of Functions in the Simulated Team, 
and Emphasis on Feedback and 
Learning. Tables summarize the 
main findings for the studies re- 
viewed, often including more details 
than those covered in the text. The 
tables introduce a number of neces- 
sary simplifications. When an in- 
vestigator employed several closely 
related measures, e.g., group morale, 
job satisfaction, status evaluation, 
findings on only one are included. 
Findings not presented in a form 
that permits evaluation are omitted. 
Findings on the effect of trials or 
learning, a statistically significant 
variable in almost all types of groups, 
are omitted unless especially rele- 
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vant. In order to permit comparison 
between studies in both the tables 
and the text, the same terms will be 
used throughout for a set of related 
measures, even when this departs 
from the investigator’s usage. For 
example, morale will refer to a vari- 
ety of measures concerned with satis- 
faction in the experimental situation. 
For the same reason, although dif- 
ferent names are used in various 
studies for the same network, one 
name will be used throughout this 
review. 


Tue INITIAL WorK 


The area of communication struc- 
ture was opened up 12 years ago by 
Bavelas (1948) with a discussion of 
mathematical aspects of group struc- 
ture. The paper is Lewinian in tone, 
using the terminology of boundary, 
region, etc. The Lewinian boundary, 
however, is translated into the link 
or channel. This translation 
major importance for all the work 
that follows. Bavelas then builds up 
a set of assumptions and definitions 
concerning a collection of cells. He 
defines cell boundary, region, open 
cell, closed cell, region boundary, 
chain, chain length, structure, cell 
distance, cell-region distance, etc., 
and considers the factors that cause 
each measure to vary, deriving 
theorems concerning the following: 
the limits of the values for the various 
distances and other measures, the 
relation between the distance meas- 
ures and the spread of a change of 
state in the structure, and, charac- 
teristics of pathways within the struc- 
ture. Bavelas then shows how the 
various distances change as a func- 
tion of structure types (e.g., organi- 
zations with varying degrees of hori- 
zontal coordination) and as a func- 
tion of an increase in the number of 
levels in the organization. He also 


is of 


discusses the role of special positions 
such as liaison positions and possible 
applications of his approach. 

The many provocative points 
raised in the paper were not directly 
followed by experimental work. Ex- 
perimental work was set off by a 
second, much simpler paper (Bave- 
las, 1950), which differs markedly 
from the first. The Lewinian tone 
has disappeared. For example, re- 
gions within structures are not men- 
tioned. Bavelas now discusses a few 
simpler concepts which readily gen- 
erate experimental situations. Com- 
plex concepts such as inner and 
outer regions and chains of connect- 
ing cells do not appear again in the 
work in this area. The only concepts 
that survive from the first paper are 
those of links and distances. The 
focus of the discussion changes, more- 
over, from the larger in situ group, 
e.g., an industrial organization, to 
the small laboratory group. 

In the second paper, Bavelas in- 
troduces the communication net- 
works which were to become stand- 
ard experimental arrangements. The 
channels of these networks are all 
two-way channels: a channel from a 
to b is also a channel from 8} to a. He 
also introduces the index of relative 
centrality to describe the structures. 
The index of the relative centrality 
(of Position x) is the ratio of the sum 
of the minimal distances of all posi- 
tions to all others over the sum of the 
minimal distances of Position x to 
all others, or 


yo p day 


C(x) =$——"*"—_ [1] 
po @ 
a 

where d,, is the minimal distance be- 
tween x and y. Many of the inves- 
tigations of group structure pub- 


lished after this paper focus on this 
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measure. In subsequent studies 
C(x) is also summed over all posi- 
tions, x, in the network to give net 
centrality. 

The main question Bavelas now 
asks is the following: Is it possible 
that 
among several communication patterns, all 
logically adequate for the successful comple- 
tion of a specified task, one gives significantly 
better performance than another (p. 726)? 


In answer to this question, he de- 
scribes experimental results obtained 
by S. L. Smith (unpublished report) 
and Leavitt (1951) who use an ex- 
perimental arrangement that is the 
model for the majority of the subse- 
quent studies, 

Five subjects were each given a list 
of symbols. Their task was to dis- 
cover which symbol they all had in 
common. The physical setting was 
arranged so that some group mem- 
bers could to each 
other, other group members could 
not. Smith’s and Leavitt's subjects 
sat around a table partitioned into 
five sections with some of 
which were open to allow notes to 
The pattern 


send messages 


slots, 


pass between sections. 
of open slots determined the com- 


munication pattern or _ structure. 
The subjects were free to use the 
open communication channels in any 
way they wished. They were not told 
the structure of the network. 

The group’s task required two 
main steps: distribution of individual 
information so that some or all mem- 
bers had all the necessary informa- 
tion, and determination of the com- 
mon symbol. The task was com- 
pleted when all subjects gave the 
answer. Smith imposed two com- 
munication structures: Circle and 
Chain (see Figure 1) and finds that 
structure affects group performance. 
The Chain is more efficient than the 
Circle. The performance ascribed to 


individuals is related to their posi- 
tions. The central positions are most 
frequently seen as leaders. Table 1 
indicates the type of data analysis 
carried out in the network studies. 
The two main independent variables 
are the patterns as units (Circle 
versus Chain) and the positions with- 
in a given pattern (a, 5, c, d, and e.) 

Leavitt (1951) used the same phys- 
ical arrangement and problem as 
Smith did, but with four structures: 
Circle, Chain, Wheel, and Y (see 
Figure 1). His main positive findings 
are that the Wheel, Y, Chain, and 
Circle (most centralized to least 
centralized) rank in descending order 
(best to least) with respect to the fol- 
lowing: (a) speed of development of 
organization for problem handling 
(the Wheel, Y, and Chain were, more- 
over, stable once they developed 
their organization. The Circle was 
inconsistent, i.e., problem solving 
procedure never became fixed); (0) 
agreement on who the group leaders 
were; and (c) satisfaction with the 
group. The ordering on these char- 
acteristics correlates perfectly with 
the ordering of the values of the net 
centrality index, =,C(x). 

During the course of 15 trials, all 
the structures showed learning, re- 
ducing the time to complete trials. 
The networks did not, however, 
differ clearly from each other in 
speed or in learning rate. Leavitt as- 
serts that the Circle used more mes- 
sages and made more errors than the 
other networks. The interpretation 
of the data, however, is unclear since 
the analyses are based on a selection 
of the data, e.g., number of messages 
on successful trials. 

Leavitt, in analyzing the effects of 
position within a network (see Table 
2) finds that the most central posi- 
tion sends the most messages and the 
least central, the fewest. Subjects at 
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CIRCLE 
5.0 


© 
5.0(b) 


5.0(a) (e) 5.0 


=C(X)* 25.0 


=C(X)* 26.1 
x 


WHEEL 


4.6 (b) 


4.6 (a) te) 46 


=C(X)s 26.4 


4.5(@) (b) 4.5 


(c)7.2 


(d)6.0 


(e)4.0 


=C(X)= 262 


Fic. 1. Five-man networks used by Smith and Leavitt with the relative 
centrality index of each position and net centrality, }> C(X). 
z 


the central position, moreover, en- 
joy their jobs more than those at 
peripheral positions. The relation of 
centrality to number of messages is 
to be expected, since the central posi- 
tions had to serve as relays for mes- 
sages from the peripheral members. 
Concerning the relation between posi- 


tion and morale, Leavitt (1951) of- 
fers the following explanation: 

In our culture, in which needs for autonomy, 
recognition, and achievement are strong, it is 
to be expected that positions which limit in- 
dependence of action (peripheral positions) 
would be unsatisfying (p. 48). 


The dependent variables of the 


TABLE 1 


SuMMARY OF SmiTa’s Data FROM BAVELAs (1950) 





| Average 
incorrect 
comple- 


Average 
total 


errors . 
tions 


14.0 5.0 


Circle 


| 
_ 
| 


Chain a : | 5.3 





Frequency of occurrence of recognized 
leader at position 





C 
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Smith and the Leavitt studies are the 
major concern of the subsequent net- 
work studies. The variables fall into 
four main classes: (a) efficiency— 
number of errors, correct comple- 
tions, speed of solution, number 
of messages; (b) leadership—posi- 
tions named as leader, agreement 
about leader; (c) morale—rating of 
group, rating of self; and (d) organi- 
zation—consistency, type. These de- 
pendent variables and the two main 
independent variables, group struc- 
tures and individual position within 
group structures, form the basic 
framework of the network studies. 


VARIATIONS AND FURTHER ANALYSIS 
OF THE Basic DESIGN 


The work of Bavelas, Smith, and 
Leavitt proliferated into an abun- 
dance of network studies. The first 
of these was a study by Heise and 
Miller (1951), introducing the fol- 
lowing variations in the original pro- 
cedures: (a) Communication took 
place over an intercom system. The 
subjects could, therefore, listen or 
speak simultaneously to as many of 
the other subjects as the network 
permitted. (6) Communication con- 
tent was highly restricted. The sub- 
jects could only relay the words on a 
given list. (c) The communication 
network included one-way as well as 
two-way channels. The five three- 
man structures used are presented in 
Figure 2. (d) Intensity of noise was 
varied over the networks. 


TABLE 2 
SumMary OF LEaAvitt’s (1951) FINDINGS FOR 
STRUCTURALLY Distinct PosITIONS 





Mean 
number 
| of mes- 
|sages sent 


Mean 
job 
enjoy- 
ment 


Pattern Position 


65.6 


Circle | a, b,c, | 


Chain 





Wheel 


25 
79. 
63. 
25 


| 
—| 
| 


Using a task in which the subjects 
had to reconstruct a master list of 
words on the basis of incomplete 
lists, Heise and Miller find that: (a) 
As the signal-to-noise ratio in the 
intercom channels was lowered, the 
number of words spoken, errors, and 
the time required to complete the 
task increased for all networks. (0) 
With increased ‘noise, the differences 
between networks became more pro- 
nounced. Generally, inefficiency of 
performance, measured by either the 
number of words spoken or the time 
required to finish the task, increased 
from Pattern 1 to 5 (in Figure 2). A 
second task in which the subjects had 
to reconstruct a 25-word sentence 
based on parts given to each of them, 


FRSA A 


2 


Fic. 2. 


3 


4 


Three-man networks used by Heise and Miller. 
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gave similar results, except that Net- 
works 2 and 3 were somewhat more 
efficient than Network 1 at the high 
noise levels. When, however, the sub- 
jects were given anagram problems 
in which communication between the 
subjects was not necessary, the re- 
sults were as follows: intense noise 
decreased the number of words 
spoken; there was no systematic dif- 
ference between the efficiency of the 
various nets. 

Aside from its introduction of a 
greater variety of channel arrange- 
ments, the main contribution of the 
study is probably that it demon- 
strates that no network is best in all 
situations. The efficiency of a struc- 
ture depends on the characteristics 
of the task. Thus, in one of the first 
network studies, the complex inter- 
actions that will mar the apparent 
simplicity of the early findings ap- 
pear. 

Guetzkow and Simon (1955) in- 
troduced the distinction between two 
classes of behavior in the network: 
direct problem solving behavior, such 
as relaying information and asking 
questions; and organizational be- 
havior, such as assigning of roles 
and functions to team members. 
They hypothesize that communica- 
tion restrictions affect only the abil- 
ity of the group to organize; once the 
group is organized, however, the dif- 
ferent structures are equally efficient 
in solving the problems. To test their 
hypothesis, they used three five-man 
networks: Circle, Wheel (see Figure 
1), and All-Channel (see Figure 4). 
Under their variant of the network 
situation, a group member could send 
only coded problem information dur- 
ing trials, but could send any kind of 
message during the intertrial periods. 

On the basis of the characteristics 
of the networks, Guetzkow and 
Simon predict that the Wheel should 


be highest in efficiency, the All- 
Channel intermediate, and the Circle 
lowest. 

The Wheel groups would have the least diffi- 
culty, for they have no channels to eliminate, 
no relays to establish, and. already have one 
person occupying a dominant position in the 
net. The All-Channel groups would have the 
next grade of difficulty, since the elimination 
of excess channels and the evolution of one 
person as solution-former are both required, 
yet relays need not be established. The Circle 
groups should have the most difficulty, for 
they need both to establish relays and to 
evolve an asymmetrical arrangement among 
the positions. They also must do some elim- 
inating of unneeded channels, although this 
last requirement is minimal (p. 240). 


Their findings on speed of problem 
solution (which also agree with 
Leavitt’s contention concerning the 
Wheel and the Circle) bear out this 
prediction. 

They cite the following as evidence 
that the structures affect organiza- 
tional efficiency: The interaction 
patterns were most stable (same 
channels consistently used) in the 
Wheel and least stable in the All- 
Channel; the greatest degree of dif- 
ferentiation of function is found in 
the Wheel, the least in the Circle. 
They show, furthermore, that if only 
the stable groups of each network are 
compared, then there are no longer 
differences in the speed in problem 
solution. They cite that finding as 
evidence that the communication re- 
striction does not affect the problem 
solving directly. 

Guetzkow and Dill (1957) follow 
up this study with an investigation of 
what happens during the trial periods, 
in which communication was limited 
to exchange of coded information, 
and during the intertrial “organiz- 
ing’ periods. They reanalyze the 
Guetzkow-Simon data with respect 
to two factors—‘‘local learning’’ (see 
Christie, Luce, & Macy, 1952, below) 
which occurs during the trials, and 
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“planning mechanism” which func- 
tions during the intertrial period— 
and conclude that All-Channel shows 
the most planning activity while 
Wheel shows the least, presumably 
because its organization is dictated 
by the communication net. They 
furthermore note that the Circle 
network is handicapped in organiz- 
ing itself during the intertrial period 
by the network restrictions, whereas 
the All-Channel structure does not 
seem to have this difficulty. 

In order to explore this point, 

Guetzkow and Dill obtained new 
data by running groups of subjects 
under an alternating structure con- 
dition. During the task trials, the 
groups were run as Circles. During 
the intertrial periods, all communica- 
tion restrictions were removed by 
opening the barred channels, giving 
an All-Channel net. These new ex- 
perimental groups are called Circle— 
All-Channel. Guetzkow and Dill 
(1957) hypothesize that 
task performance in a restricted net will be 
equal to that in an unrestricted net, if the 
restrictions are removed during the intertrial 
period so that a relay system may be organ- 
ized (p. 191). 
An analysis of task trial times failed 
to support the hypothesis. Circle— 
All-Channel groups do not differ in 
performance time from the Circle 
groups in the earlier experiment. 
All-Channel groups were, moreover, 
significantly faster than Circle—All- 
Channel groups. The main contribu- 
tion of the two studies above is their 
suggestion concerning the ways in 
which communication structure im- 
pedes the group’s attempts to or- 
ganize itself for its work. 

Goldberg (1955) brings to the net- 
work study a new task, the unstruc- 
tured group decision task, and a new 
dependent variable, influence (or, 
more precisely, “influenceability’’). 


He hypothesizes that in group deci- 
sions, central positions in a network 
would be influenced less than periph- 
eral positions. He placed subjects 
in the five-man Wheel, Y, and Chain 
and showed them a card bearing a 
number of dots. The subjects then 
communicated with each other and 
settled on an estimate of the number 
of dots. Influence, measured by the 
amount that a subject changed his 
initial estimate during the experi- 
mental session, is found to be nega- 
tively related to the centrality of the 
position only for the Y network. He 
finds, however, a positive relation be- 
tween the centrality of a position and 
the number of leader nominations. 

Trow (1957) develops a_ point 
made by Leavitt (1951, p. 49) into 
the hypothesis that centrality pro- 
duces high morale and status not 
just because centrality implies 
greater access to communication 
channels, but because greater access 
to channels gives autonomy—ability 
to make independent decisions. Trow 
argues that though centrality and 
autonomy are usually correlated, 
they can be separated experimentally 
and that when they are separated, 
autonomy will be found to be the 
effective variable. He accomplished 
this separation by placing his sub- 
jects in apparent three-person chains 
and passing prepared notes to them 
to create the illusion of a group. 
Trow varied autonomy by giving 
some subjects a code book needed in 
planning the group’s task and in- 
forming other subjects that some- 
one else in the group had the code. 
He also gave the subjects a ques- 
tionnaire to measure their need for 
autonomy. 

The major findings are the follow- 
ing: autonomy produces a _ higher 
level of job satisfaction than does 
dependence; the effect of centrality 
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TABLE 3 


Synopsis OF INITIAL AND FoLLow-up STUDIES 


Dependent 


Investigato 


Network 


Independent 


Bavelas ircle 
Smith common syn 5-man Positi 
(1950 bol 


| Chain, Circle, 
W heel, y 


>-man 


Determining 
| common sym 
bol 


Leavi 


Heise and stri Chain, one-way 
Miller (1951 f word lists, and two-way Noise 
channel Circles Task 


variable | 


rk accuracy N 


| Netw 
| yn Centrality 


Network | 
Position Centrality | 


Network 


Fi € ir s* 
variable ndi ns 


+a: Ch> 


leader nomination | PC-ln: + 


| | 

Ns: o> 

N-a: Y>Cc 

N--nm: Ch, WI, ¥Y <Cc 
PC—ln: + 

PC—mr: + 

PC—nm: + 


j N-s: + 


| speed | 


accuracy j 
leader nomination 
morale 
number « 
sages 


f mes 


sree 
accuracy 
number of words 


man Network X Noise 
Task Inte: 


All-Channel, 
Circle, Wheel 


man) 


All-Channel, 
Circle Ce 
5-man) 


Network 


ommunication 
Restriction dur 


n | NXNsXT—s, a, nw: + 


—s: WI>A-Cl>Ce 
W1>Cc>A-Cl 


speed iN 

organizational | N--o st 
stability 

message content 


speed 
message content 


ComR-—+s: 0 


ing Intertrial 


Organizin 
rd 


Circle 


ri 


Pe- 
Circle vs 


All 


Channel 


veel, Y Netw 


Group decisi Chain, WI 
f Posit 


n number 5-man 
dots 
Modified com vulated Chain 
mon symbol 
problem 


| Posit 


rk “influenceability” 
n Centrality | 


Position Autonomy 
on Centrality 
| Need for Au | 


tonomy i 


PC-—infl: 0 
leader nomination | PC-—In: + 


| PA~m: + 
| PA-—st: 0 
| PC—m: 0 
| PC—st: + 


morale 
status 


* The abbreviations in the Findings column of this and the following synoptic tables are derived from the independent 


variable and the dependent variable. They read as follows 


s: + = Network (independent variable) has an effect on speed (dependent variable). 
PA—-m: 0 = Position Autonomy does not have an effect on morale ’ 
If the independent variable is at least an ordinal measure, then the symbol + takes on added meaning, signifying the direc- 


tion of the relationship. In these cases 
Ns—nw 


Ns—s = Noise level is negatively related to speed. 


+ = Noise level is positively related to the number of words transmitted. 


If the independent variable is a nomina) measure, then the findings are abbreviated as follows: 


“s: WI>A-Cl> 

Ineq ualities in such findings are always given « 
N y 

but Nom: WI, Y, 


messages) than Circle 


> Cc = Network affects speed, with Wheel faster than All-Channel which is faster than Circle. 
vith the superior groups on the left. 
> Cr pees wk affects accuracy, with Y better than Circle, 

<Ce=Network affects number of messages, with Wheel, Y, 


Thus, 


and Chain better (requiring fewer 


> The interpretation of the finding does not agree with the investigator's. 


upon satisfaction is not significant. 
The relation holds primarily for the 
high-need subjects. Trow concludes 
that “autonomy may be considered 
as mediating the observed relation- 
ship [found by Leavitt] between cen- 
trality and satisfaction” (p. 208). 
Predictions concerning a parallel ef- 
fect of autonomy on self-ascribed 
status were not supported. Status 
was, however, affected by centrality. 

The studies summarized in this 
section exemplify the major devel- 
opments of the original theme: addi- 


tion of new variables, e.g., noise, and 
analysis of the structural variables 
into psychological components. A 
synopsis of these studies and the 
initial network studies is presented in 
Table 3. 


TESTING THE LIMITS OF THE BAsICc 
DESIGN 


Shaw has systematically worked 
the area opened up by Bavelas, ex- 
tending the investigations to include 
such variables as amount and dis- 
tribution of information, problem 
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complexity, and type of leadership. 
He has also suggested additional con- 
cepts: independence of positions 
(rather than centrality) and satura- 
tion. 

Shaw (1954a) extended the net- 
work investigation to four-man 
The names he 


groups (see Figure 3). 
assigns to his networks raise an in- 


teresting question. Could not the 
four-man ‘“‘Wheel”’ also be called a 
“Y’’? The question has importance 
in comparing results for networks 
that differ in size. There is no em- 
pirical or rational basis for matching 
results from a four-man and five-man 
““‘Wheel.”” The only thing clear is 
that the number of distinct patterns 
decreases as the number of group 
members decreases. Therefore, al- 
though Chain, Wheel, and Y are dis- 
tinct patterns for five-man groups, 
when the number of members is re- 
duced by eliminating a _ peripheral 
member, only two of these three pat- 
terns remain: four-man Chain and 
Wheel-Y. If the number of members 
is reduced again, the two remaining 
networks coalesce into the simple 
three-man Chain. The difficulty 
caused by ignoring this characteristic 
will be pointed out later. 


Shaw finds that the centrality of 
position is related, as in the Leavitt 
study, to number of messages sent, 
satisfaction, and frequency of nomi- 
nation as leader. He proposes, how- 
ever, an alternative to centrality, the 
related concept of independence, J, 
and constructs a measure of it. Shaw 
then plots mean number of messages, 
morale, recognition of leadership 
against J for his own and Leavitt's 
data. J appears to give plots that are 
more nearly monotonic than does 
centrality. The functions are, how- 
ever, not only complicated, but also 
differ in form for presumably compa- 
rable Shaw and Levitt data. For ex- 
ample, the equation relating morale 
to J is logarithmic for Leavitt’s data 
and linear ‘or Shaw's. The need for a 
concept like independence in explain- 
ing behavior within the networks had 
been expressed by Leavitt (1951) and 
has received empirical support by 
Trow (1957). Shaw’s J, however, is 
an awkward and complex combina- 
tion of variables. Since he gives no 
statistical evaluation of the improve- 
ment of J’s fit of the data over the fit 
yielded by centrality, it is difficult to 
judge whether J’s smoother curves 
compensate for its greater complex- 
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ity. (The comparability of Leavitt's 
and of Shaw’s data is of concern. 
Shaw told his subjects what the net- 
work structure was. Leavitt and the 
investigators following his procedures 
did not: Other aspects of the pre- 
sumed comparability of data are dis- 
cussed later on—e.g., Shaw, 1954b). 

The data Shaw considered above 
were drawn from a separately pub- 
lished study (1954c) aimed at testing 
the hypothesis that the distribution 
of information affects the behavior of 
networks. Since the more central 
positions usually have more informa- 
tion than the other team members 
during the major part of a trial, the 
effects of centrality and amount of in- 
formation are ordinarily confounded. 
Shaw, therefore, varies the amounts 
of information initially given to posi- 
tions within three networks. In this 
way, he separates to some extent the 
effects of the two variables. 

He uses the three four-man com- 


munication patterns depicted in Fig- 


ure 3: Circle, Wheel (or Y), and 
Slash. The groups solved arithmetic 
problems for which each team mem- 
ber held some of the necessary infor- 
mation. In some teams, all members 
had the same amount of information. 
In other teams, the information was 
unequally distributed with the posi- 
tions marked a in Figure 3 receiving 
more information than the others. 

He finds that central positions and 
the positions with the larger amount 
of initial information tended to solve 
the problems more quickly. There 
were no significant effects of networks 
or distribution of information condi- 
tions on network speed. Here, and in 
the following studies, Shaw centers 
much of his data analysis on the 
higher order interactions, e.g., net- 
work with information distribution 
with trials. Since his hypotheses and 
his conclusions are not at this level, 


attention will be given primarily to 
main effects. 

The results on number of messages 
as related to network (Circle versus 
Wheel) and as related to position 
centrality agree with Leavitt's find- 
ings for five-man groups. In general, 
the Wheel required fewer items to 
reach a solution than the Slash or 
Circle and central positions sent more 
messages than peripheral positions. 
Shaw also finds that positions given 
more information sent more items 
than did the same positions under 
equal distribution of information. 

What is the meaning of a relation 
between the number of messages—a 
measure used by Leavitt, Shaw, and 
the investigators who follow them 
and position differences? Since the 
different positions have to send differ- 
ent minimum numbers of messages 
to complete a trial, it is not very en- 
lightening to note that differences 
appear. In a five-man chain with 
each man holding one item of infor- 
mation, the end men have to send 
only one message in order to assure 
complete distribution of their infor- 
mation. The central man has to 
transmit five items. It is necessary to 
relate the number of messages sent 
by a position to the minimum for the 
position. Otherwise, it is as if an 
experimenter reported significant dif- 
ferences in the number of responses 
by two experimental groups when 
one group of subjects is requested to 
name two items each,-the other re- 
quested to name only one. 

Shaw does not find that differences 
in network affect the number of 
errors, although unequal distribution 
of information lowers the number of 
errors significantly. On the other 
hand, leadership, measured in terms 
of preference in a sociometric ques- 
tionnaire, was related to centrality 
but not to information distribution. 
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Similarly, group morale measures 
and individual morale or job satis- 
faction ratings were, as in the Leavitt 
study, related to centrality. They 
were not, however, related to infor- 
mation distribution. 

Gilchrist, Shaw, and Walker (1954) 
explored the effect of distribution of 
information further by giving addi- 
tional information not only to pe- 
ripheral, but also to central positions 
in the four-man wheel. Their three 
experimental conditions consisted of 
an equal distribution of information, 
an unequal distribution to the pe- 
riphery (one peripheral subject re- 
ceiving more information than the 
others), and an unequal distribution 
to the center (the center subject 
receiving more information). Dis- 
tribution of information did not have 
any significant effect on overall group 
performance as seen in time scores, 
error scores, sociometric choices, num- 
ber of message units, and leadership 
It did have an effect on 
behavior at individual positions. In- 
creasing the initial information, in 
general, decreased the time scores 
and the number of mes- 
sages transmitted, job satisfaction, 
and position status rating. The in- 
vestigators’ expectations concerning 
the order of the time scores are not 
met. The central position with addi- 
tional information has a higher time 
score than the peripheral positions 
with additional information and also 
a higher time score than the central 
under equal information 
Primarily to explain 


emergence. 


increased 


position 
distribution. 
the latter result, they introduce the 
concept of saturation, defined as the 
input and output requirements which 
are imposed on positions within a 


group structure. The concept, a 
promising one which suggests that 
communication requirements may 
counteract the effects of centrality, 


is explored in a subsequent investiga- 
tion (Shaw, 1955). 

Shaw (1956) investigated the effect 
of another aspect of information dis- 
tribution in communication networks: 
random versus systematic distribu- 
tion. In solving an arithmetic prob- 
lem consisting of four distinct steps, 
each member of a four-man group 
may have all the information items 
necessary to complete one of the 
steps. This is called\systematic dis- 
tribution. A random distribution is 
one in which each of the information 
items is assigned at random; a 
member, therefore, usually has to go 
to several sources (other members) 
for the information to complete one 
step of the problem. This type of ex- 
perimental operation brings the net- 
work study close to the situations 
used by Lanzetta and Roby in their 
manipulation of ‘‘dispersion of infor- 
mation sources”’ (see below). Shaw 
predicts that systematic distribution 
will increase efficiency and job satis- 
faction and that the increase will be 
greater if the subjects are informed 
about the system of distribution and 
if the network permits freedom of 
action—e.g., All-Channel as com- 
pared with Wheel. Analysis of time 
to solution in the Wheel and All- 
Channel networks, in part, supports 
Shaw’s predictions. Knowledge of 
distribution is not, however, signifi- 
cant as a main effect or in interaction 
with distribution of information. 
Networks, also, do not differ signifi- 
cantly on the time measures. 

Another follow-up (Shaw, 1954b) 
of the distribution of information 
study by Shaw (1954c) attempts to 
reconcile an apparent discrepancy 
between his and Leavitt’s (1951) 
study. Leavitt found some evidence 
that the five-man Wheel network 
solves problems faster than the five- 
man Circle. Shaw's four-man Circles 
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are somewhat faster than his four- 
man Wheels. The speed difference is, 
however, not statistically significant. 
Shaw suggests that the difference 
stems from the difference in the com- 
plexity of the problems used. This is 
essentially Heise and Miller’s (1951) 
point that different structures will be 
best for different tasks. Shaw’s 
(1954b) main hypothesis is 

that a communication net in which all Ss are 
in equal positions (the circle) will require less 
time to solve relatively complex problems but 
more time to solve relatively simple problems 
than will a communication net in which one S 
is placed in a central position (the wheel) 
(p. 211). 


To test this hypothesis, Shaw gave 
simple (common letter) and complex 
(arithmetic) problems to the three- 
man Wheel and Circle (Networks 3 
and 1, respectively, in Figure 2). 
Two points should be noted con- 
cerning his structures: they are three- 
man groups not five-man groups, as 
in the Leavitt study, and not four- 
man groups, as in the other Shaw 
study; the question raised earlier 
concerning the naming of networks 
may be raised again. Why is the 
third pattern in Figure 2 called a 
“Wheel” rather than a “Chain’’? 
With these points in mind, it seems 
unlikely that differences in the results 
of a study of five-man groups and a 
study of four-man groups can be re- 
solved by a study of three-man groups. 
Resolution is especially unlikely since 
the Chain, which, according to Leavitt, 
tends to be slower than the Circle, 
and the Wheel, which tends to be 
faster than the Circle, reduce to a 
single network in the three-man 
group. Shaw identifies this network 
with the fast Wheel. It could just as 
well be identified with the slow Chain. 
In any case, Shaw’s main hypothesis 
is that the interaction of problem 
complexity with network has an effect 
on solution time. The results tend to 


support his prediction, but are not 
statistically significant. Analysis of 
the number of items communicated 
and errors does not add any support 
to the hypothesis. 

The problem complexity with net 
centrality interaction is pursued in 
one more study in which Shaw (1958) 
manipulates complexity by the addi- 
tion of irrelevant information to 
arithmetic problems given to the 
four-man All-Channel and Wheel. 
The evidence is again unclear. A 
significant effect of the interaction is 
found on the number of messages but 
not on time to solution. 

Shaw (1955) has better luck with a 
study of the effect of saturation and 
independence. He elaborates these 
concepts and through them arrives 
at the variables of the classic Lewin, 
Lippitt, and White study (1939): 
autocratic versus democratic leader- 
ship. He assumes that the leader’s 
style affects both saturation and in- 


dependence: ‘‘autocratic’’ leaders de- 
crease both the independence and 
saturation of the followers and “‘dem- 


ocratic” leaders increase both. Inde- 
pendence is assumed to improve 
performance and morale, with a 
greater effect on morale. Saturation 
is assumed to lower performance and 
morale, with a greater effect on per- 
formance. From these assumptions, 
Shaw derives the following predic- 
tions: autocratic leaders will promote 
better performance than democratic 
leaders, autocratic leaders will cause 
poorer morale, and differences be- 
tween central and peripheral posi- 
tions will be accentuated by auto- 
cratic leadership. 

The two leadership conditions were 
used with the four-man Wheel, Kite, 
and All-Channel (see Figure 3) solv- 
ing arithmetic problems. The sub- 
ject at Position 5 in the network was 
assigned the role of leader and was 
instructed to be either “‘autocratic”’ 
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or “democratic” in his handling of 
directions and suggestions. As Shaw 
predicted, the autocratic groups are 
higher in efficiency and lower in 
morale. Although analysis of data 
for individual positions confirms pre- 
vious findings that the central posi- 
tions solve the problems more quickly, 
send more messages, and have higher 
morale, it does not confirm Shaw’s 
prediction that autocratic leadership 
will increase the difference on these 
measures ween central and pe- 
ripheral positions. It might be added 
that in the Lewin, Lippitt, and White 
study, autocratic and democratic 
leadership styles generated an analog 
of the Wheel and the All-Channel, 
respectively. Under autocratic lead- 
ership, for example, most of the 
group’scommunications were directed 
at the leader. Shaw's study, there- 
fore, may be viewed as involving two 
types of manipulation of communica- 
tion structure: direct manipulation 


through the elimination of channels 
and indirect manipulation through 
the effects of the leader’s style. 

In the preceding studies by Shaw 
and his associates, the groups worked 


two to four problems. The effect of 
prolonged experience was investi- 
gated by Shaw and Rothschild (1956). 
Groups in the Wheel, Slash, and All- 
Channel structures (see Figure 3) 
solved two arithmetic problems a day 
for 10 days. The usual analyses are 
made of time scores, number of mes- 
sage units transmitted, and satis- 
faction ratings. The results, to some 
extent, agree with the results of previ- 
ous studies (Shaw, 1954c, 1955). 

The merging, seen in the study on 
leadership style, of Shaw’s network 
investigations with the more con- 
ventional social psychological tradi- 
tion continues in a study by Shaw, 
Rothschild, and Strickland (1957) in 
which they use unstructured group 
discussion tasks. Each member of the 


group starts with all the information 
required for a decision. The group 
members have to interact only to 
reach an agreement on the solution. 
The networks differ significantly in 
the time required to reach a decision. 
The Wheel requires the longest time 
and the All-Channel, the shortest. 
The finding agrees, to some extent, 
with Shaw and Rothschild’s findings 
on the same networks solving arith- 
metic problems. Two other experi- 
ments reported in this article investi- 
gate the effect of the position within 
a network upon the ability of an in- 
dividual to maintain nonconforming 
opinion. These experiments are 
similar to Goldberg’s experiment 
(1955). The results, in general, indi- 
cate that the amount of change that 
a subject is willing to make is a func- 
tion of the amount of support and 
opposition he faces rather than any 
position characteristic. The data on 
the relation between position cen- 
trality and tendency to be influenced 
do not, however, permit clear inter- 
pretation. Goldberg, it may be 
recalled, finds no overall relation be- 
tween centrality and tendency to be 
influenced. 

In summary, Shaw and his associ- 
ates have exhaustively worked the 
area opened by Bavelas and Leavitt. 
They have also introduced new con- 
cepts, e.g., independence and satura- 
tion, which are worth further exami- 
nation. Their work forms a major 
body of data concerning the effect of 
structure on group behavior. An 
overall summary of these findings is 
presented in Table 4. A glance at the 
variables employed in _ successive 
studies indicates that the area has 
been worked not only exhaustively, 
but to exhaustion. 

After a promising start, the ap- 
proach has led to many conflicting 
results that resist any neat order. 
Perhaps more significant as a symp- 
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MATHEMATICAL ANALYSIS 


Christie, Luce, and Macy and their 
associates have carried out an inten- 
sive program of investigation of be- 
havior in group networks. They have 
emphasized ‘“‘pure’”’ structural char- 
acteristics and have subjected their 
data to detailed mathematical and 
logical analysis. The full range of 
their approaches to network behavior 
is set forth in two reports (Christie 
et al., 1952; Luce, Macy, Christie, & 
Hay, 1953). In the first report, they 
discuss the various aspects of net- 


Wheel” by Christie et al.) 


work behavior extensively, and ana- 
lyze data obtained in a series of ex- 
perimental studies. 

One of the studies was concerned 
with the effects of learning on per- 
formance in the networks in Figure 4. 
Christie (1954) later published results 
for the five-man Circle, Chain, All- 
Channel, and Pinwheel.* The groups 
solved a series of 25 list reconstitu- 
tion problems like those in the Heise- 
Miller (1951) study. An “action 
quantization” restriction was im- 
posed in order to simplify the data 
for analysis: 


The subjects were required to send single- 
) 


* This section draws both from results pre- 
sented in the larger report (Christie et al., 
1952) and from the separate report by Christie 
(1954). 
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address messages at prescribed times, to in- 
clude in their messages all problem informa- 
tion known to them at the time, and to write 
nothing other than problem information. . . . 
Thus, each message-sending action, herein- 
after called an act, was a simultaneous sending 
by the whole group (p. 189). 


In their data analyses, these in- 
vestigators use the minimum number 
of acts in which it is possible for a 
network to complete its task as base- 
lines. The minimum possible number 
of acts is an important consideration 
which has been neglected in other 
network studies. All networks in 
Figure 4 have minima of three acts 
except for the Chain with a minimum 
of five. Chain-(X) and Circle-(X) 
are topologically the same as Chain 
and Circle, respectively. They differ 
from Chain and Circle only in the 
physical arrangement of the positions. 
The investigators computed the dis- 
tribution of the number of acts re- 
quired for completion on the assump- 
tion that the group members dis- 
tribute their information at random 
over their channels. It is not surpris- 
ing that comparison of the theoretical 
with the observed number of solu- 
tions show that the subjects do better 
than chance from the start. Clear 
differences in learning efficiency be- 
tween networks demon- 
strated. 

Christie (1954) 
results for four of 
follows: 


are also 


summarizes the 
the networks as 


Groups using the totally connected net- 
work [All-Channel] do somewhat better than 
random but show a negligible amount of 
learning. The groups in chain learn well, but 
their performance is good only with respect 
to the chain minimum of five acts per trial. 
The high minimum for this network makes 
its absolute performance poor in comparison 
to each of the other networks. The pinwheel 
network performs somewhat better than 
random, and its random distribution is a 
favorable one [i.e., the mode is close to the 
minimum]. Like totally connected it learns 
little, so that its final distribution is prac- 
tically the same as that in totally connected. 


Circle is the one very different case; it 
achieves the best distribution [i.e., it fre- 
quently completes the task in the minimum 
possible] in the final block as a result of excel- 
lent learning (p. 193). 


Christie, Luce, and Macy introduce 
the concept of “‘locally rational’ be- 
havior to explain the differences be- 
tween networks on the basis of be- 
havior at the individual positions. 
Locally rational behavior is the 
tendency to send successive messages 
to different stations.so as to maximize 
the amount of new information re- 
ceived by neighboring positions. 
the behavior called for depends only 
on each subject’s attending to condi- 
tions immediate to his own position, 
i.e., to whom he has sent and from 
whom he has received’’ (Christie, 
1954, p. 195). The investigators used 
Monte Carlo runs on a computer in 
order to obtain the theoretical dis- 
tribution of the number of acts to 
completion under both the equiprob- 
able random behavior and the locally 
rational behavior model for each net- 
work. In a final summary of their 
work, Christie, Luce, and Macy 
(1956) show that in successive trials, 
network performance (the distribu- 
tion of number of acts) approaches 
more and more closely that of the 
locally rational model. It is not clear, 
however, whether this generalization 
holds for the All-Channel network. 

In analyzing the learning of sub- 
jects within the various networks, 
they also pay attention to the im- 
portance of differences in the proba- 
bility of various initial act patterns. 
For example, in the Circle a mini- 
mum solution is possible only with 
an. initial pattern of two mutual 
interchanges and one unreciprocated 
message (e.g., a with 5, ¢ with d, and 
e toa). In the Chain, any initial pat- 
tern of acts can result in a minimal 
solution. 

They find another stimulus for 
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experimental work and analysis in 
information theory concepts. In a 
study on coding noise presented in 
the main technical report (Christie 
et al., 1952) and published separately 
by Macy, Christie, and Luce (1953), 
they examine the effects of ambiguity 
of stimuli, interpreted as semantic 
noise. (Heise and Miller, 1951, stud- 
ied the effect of acoustic noise in the 
communication channel.) They used 
the five-man Circle, Chain, Wheel, 
and Pinwheel (see Figures 1 and 4). 
Another variable, labeled ‘‘feedback,”’ 
is introduced by giving some Wheel 
groups additional information con- 
cerning errors at the end of each 
trial. 

The groups’ task was to discover 
the color of a marble that all held in 
common. Fifteen problems with 
marbles of clearly identifiable color 
were followed by 15 trials with am- 
biguous stimuli—marbles of mixed, 
indistinct color. The authors state 


that the data on speed and number of 
messages agree with Leavitt's results. 
Their main findings on learning to 
handle the ambiguous stimuli do not 


give a simple picture. The Circle 
reduces its errors markedly over suc- 
cessive trials. The other structures 
do not. The explanation of these 
results may lie in Shaw's (1954b) 
hypothesis, as yet undemonstrated, 
that centralized structures are handi- 
capped on complex problems. Intro- 
duction of additional feedback in the 
wheel network seems to improve 
performance somewhat. 

They pursue their informational 
analysis with an estimate of “‘condi- 
tional receiver entropy”’ based on the 
number of different marbles called 
by the same name. They point out 
that the method by which the effi- 
cient networks reduce ambiguity 
seems to be by an increase in re- 
dundancy (computed in terms of the 
number of extra names given to a 


marble). The behavior of the net- 
works is further analyzed qualita- 
tively in terms of error feedback (the 
opportunity of the members to ob- 
tain the same information from at 
least two different sources) and the 
opportunity to correct errors (the 
presence of symmetric, i.e., two-way 
channels). The Pinwheel lacks the 
latter, while the Wheel, and to some 
extent the Chain (in its end mem- 
bers), lacks the former. The presence 
of both, the investigators argue, is 
necessary for optimal performance. 
Christie et al. (1952) try to carry 
out detailed analysis and derivation 
of every type of data generated by 
the original network studies. They 
try to derive the distribution of group 
latency data on the basis of assump- 
tions concerning the individual la- 
tency distribution. They analyze the 
determinants of leader designation, 
using an index based on 
the relative frequency of use of a channel (on 
an equiprobable sending basis) and the 
mean input of the sending end of the channel 
as an estimate of the sending end’s value to 
the receiving end (p. 179). 


Their index fits the obtained values 
for two networks rather well. 

They devote a similarly detailed 
analysis to the determinants of job 
satisfaction, following their general 
approach of using the individual 
position as a basis for prediction. An 
index called input potential which 
considers the input density for each 
position is found to be more highly 
correlated with job satisfaction than 
peripherality. The formula for input 
potential gives some idea of the level 
of the analysis. 


where I is mean input. 
In a subsequent report (Luce et al., 
1953), they carry out the same types 
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TABLE 5 


Synopsis oF Curistie, Luce, Macy Stupres 


Task 


Network 


| anni 
| All-Channel, Chain, | 
Circle, Pinwheel 


5-man) 


Investigator 


Reconstruction 
of number 
list 


Christie (1954) 
also in 

Christie, Luce, 

and Macy (1952 


| Chain, Circle, 
Pinwheel, Wheel | 
($-man) 


Macy, Christie, | Determining 

and Luce (1953) | common am 
also in | biguous mar 

Christie, Luce, ble 

and Macy (1952) 


of analyses and also examine the addi- 
tional problem of the effects of change 
of network structure, with subjects 
trained in one network and tested in 
another. The later work is even more 
complex, involving a multiplicity of 
approaches, and does not lend itself 
readily tosummary. Since their work 
is primarily concerned with analytic 
techniques, Table 5, which includes 
only those empirical results that are 
comparable to the results in the other 
summary tables, does not do justice 
to the full range of their work. The 
general philosophy and major accom- 
plishments of their research effort is 
summarized by Christie et al. (1956). 

The main tendency of the analysis 
by Christie, Luce, and Macy is away 
from functions involving overall meas- 
ures of the group, e.g., network cen- 
trality, and toward the derivation of 
the behavior of the group from that 
of the individual positions. Their 
efforts may be considered to parallel 
Shaw's. Just as he carried the em- 
pirical work in the area as far as it 
can go, so do they carry the mathe- 
matical analysis to the limit. In both 
cases, it was desirable to have the 
job done. It seems unlikely now, 
however, that the payoff will 
commensurate with the energy and 
ingenuity that was invested. This 
could, of course, be only discovered 
by the doing. 

With the efforts of both Shaw and 


be 
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variable variable Findings 
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amount of 
learning 

number acts 
to solution 
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Trials: Learning | 


Network | accuracy 
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his associates and of Christie, Luce, 
and Macy carried as far as they can 
go, a new approach or new definition 
of the field seems necessary. The re- 
maining sections of this paper will 
review some of the attempts at re- 
analyzing or redefining the area. 
RETROSPECT AND PROSPECT 

At this point, it is appropriate to 
look back at the problem as originally 
stated and its expression in experi- 
mental form. 

Two main questions posed by 
Bavelas are the following: What 
effect does the structure of the group 
have upon its efficiency? What effect 
does position in the group have upon 
the subject’s morale and job satis- 
faction? There is no simple answer 
to the first question. The effect of 
structure depends in part on the re- 
quirements of the task (Heise & 
Miller, 1951). Contrary to Leavitt's 
original generalization (1951), in a 
number of studies the highly cen- 
tralized structures are less efficient 
than other structures (Macy et al., 
1953; Shaw, 1958; Shaw et al., 1957). 
The answer to the second question is 
somewhat clearer. Morale seems to 
be a function of centrality of position. 
The psychological basis for this rela- 
tionship, however, warrants further 
analysis. Explanations have been 
offered in terms of autonomy (Trow, 
1957), independence (Shaw, 1954a), 





EFFECTS OF STRUCTURE IN SMALL GROUPS 19 


and input potential (Christie et al., 
1952). 

The unclear answers to the first 
question may arise from the peculiar 
experimental situation used to ex- 
press it. The characteristics of the 
original Bavelas-Leavitt situation 
that recominend it are its apparent 
experimental simplicity and _ rele- 
vance to real-life situations. Does the 
situation actually have these charac- 
teristics? That the situation is not 
simple is evidenced by the introduc- 
tion of techniques to simplify it fur- 
ther, e.g., action quantization. Even 
with the imposition of further restric- 
tions, however, a precise analysis of 
the activity of the groups is unman- 
ageably complex. 


That the situation is far distant 


from most familiar real-life situations 
can be seen by reviewing the special 
characteristics of the laboratory net- 
works. They are the following: 


1. Interdiction of certain channels. This 
is the most obvious of the special characteris- 
tics of the laboratory networks. To some ex- 
tent, this corresponds with conditions in 
natural groups. Some communication chan- 
nels are frequently closed to members of 
groups. For example, a man may not be per- 
mitted to go over the head of his immediate 
supervisors in a work group, or he may be 
unwilling to make certain statements when 
another member is present. 

2. Ignorance concerning other positions. 
This is probably both the effective and the 
really unique aspect of the communication 
restriction. The network members know very 
little about other positions and about behav- 
ior of any except adjacent positions. This is 
a condition that does not hold in small groups. 
The effect of this factor can, of course, be re- 
duced to some extent by changes in the pro- 
cedures of the network studies. In the Guetz- 
kow-Simon (1955) study, for example, this 
may have been done by having intertrial ad- 
ministrative discussions. 

3. Necessity of each member. In almost all 
the network studies, each member is essential, 
because each member holds an essential piece 
of information and each member must present 
a solution to the problem. In some cases, one 
member may have more or less information 


but in almost all the studies the elimination 
of one member prevents success of the group. 
This is not generally true in real-life groups. 


These special characteristics of the 
network studies would make gen- 
eralization difficult even if the findings 
were unequivocal. The applicability 
of the findings of the network studies 
are in question because the character- 
istics of the structures employed in the 
studies are very different from other 
small groups. The following point, 
however, may be argued: If the net- 
work studies have any application, it 
will not be in the small group, but ina 
much larger unit such as an indus- 
trial corporation or an army. Char- 
acteristics analogous to those listed 
above are more clearly present in 
large groups. For example, depart- 
ments of a company may not have 
direct communication channels; they 
often lack information concerning 
distant sections and all departments 
may be necessary for the company to 
function. 

If the laboratory network cannot 
be viewed as a simplification of the 
general small group situation, can it 
be viewed as a laboratory simplifica- 
tion to permit testing of an explicit 
theory about group behavior? The 
answer, unfortunately, is no. At the 
present time, a theory concerning 
behavior in the network does not 
exist. This raises a major point. Per- 
haps the most surprising thing about 
the entire area is that despite the 
highly formal origins of these studies 
(Bavelas, 1948), the organized body 
of theory promised by the approach 
has not yet appeared. 

Perhaps in response to considera- 
tions such as these, two attempts 
have been made to use a somewhat 
different approach to the study of the 
effects of group structure on behavior. 
One of these attempts is by Lanzetta 
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and Roby. Their main aim is to draw 
the experimental situation closer to a 
known type of group—the military 
work team. The other attempt is by 
Rosenberg and Hall. Their main 
objective is to simplify the experi- 
mental situation (two-person situa- 
tions) and rephrase the problem so 
that available theory—learning the- 
ory—can be brought to bear on the 
problem. Both approaches assign 
new definitions to the term structure. 
For Lanzetta and Roby, team struc- 
ture refers to the specialization and 
interrelation of jobs in a team. For 
Rosenberg and Hall, team structure 
refers to the degree to which the 
information that an individual re- 
ceives about his performance is con- 
founded by the performance of an- 
other team member. 


EMPHASIS ON DISTRIBUTION OF 
FUNCTIONS IN THE SIMULATED 
TEAM 

Lanzetta and Roby have directed 
one major attempt to examine, from 
a new viewpoint, the relation of group 
structure to performance. Their 
attempt is embodied in a series of 
studies in which they vary the ways 
that team members depend on each 
other for information. In a situation, 
quite unlike the Bavelas network, 
modeled after military teams, e.g., a 
bomber crew, they gave teams a series 
of very short problems in order to 
approximate a continuously changing 
environment. They vary communica- 
tion structure not by interdicting 
channels as in the network studies, 
but by restricting relevant informa- 
tion or specific functions to a given 
position. Their team is like the All- 
Channel network but with each sub- 
ject working on a separate problem 
and holding some information re- 
quired by other team members. De- 
spite these differences in experimental 


situation and in definition of struc- 
ture, the basic concern remains the 
same: what factors in the organiza- 
tion of a group affect its performance? 
An early study by Lanzetta and 
Roby (1956b) indicates both the de- 
velopment of their experimental situ- 
ation and the type of practical situa- 
tion from which it grew. In this 
study, they investigate the effect of 
two methods of work distribution 
(work structure) under two task load 
conditions on group performance. 
They model the experimental situa- 
tion after an air defense center with 
two work structure conditions. In 
vertical structure, each group member 
had one of three tasks: tracking air- 
craft, identifying aircraft and keeping 
a record of the interceptors’ fuel 
status, or deploying friendly planes. 
In horizontal structure, each group 
member performed all three functions 
for his own targets. Variations of the 
number of airplanes produced two 
different task load conditions. Of the 
main independent variables of the 
study—structural organization, load 
conditions, and their interaction— 
only load condition has a significant 
effect. The interpretation of this ef- 
fect is, however, complicated by a 
significant interaction with sessions. 
The main outcome of the study was 
a methodological development rather 
than an empirical finding. It led toa 
simpler task with higher reliability 
for use in the subsequent studies. 
This task, modeled after a bomber 
crew's task, was used in their next 
experimental study (Roby & Lan- 
zetta, 1956a) to demonstrate the 
effect of relaying requirements upon 
group performance. Groups of three 
subjects sat, each in a separate booth 
that contained instrument reading 
displays, pairs of control switches, 
and instructions giving the correct 
switch settings for each possible pair 
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of instrument readings. The instru- 
ment readings required to set a given 
control could be displayed in the 
booth containing the related control 
or they could be shown in one of the 
other booths. In the latter case, the 
subject receiving the information 
would have to relay it to its eventual 
user over the intercom system con- 
necting the booths. 

Roby and Lanzetta used four com- 
munication structures which differed 
in the degree to which the subjects 
had direct access to the information 
they needed. A significant difference 
in the number of errors appears be- 
tween the communication conditions. 
Analysis indicates that more errors 
are made on a control if its two rele- 
vant items of information had to 
come from two sources rather than 
from one source. The results cannot 
be considered surprising. If a subject 
has to get the necessary information 
from someone else, who is also busy, 
he will not do as well on a highly 
speeded task as a subject who has 
his information immediately avail- 
able. Furthermore, if he has to make 
two separate information requests in 
a brief (15-second) period, he will be 
more likely to fail than if he has to 
make only one request. 

Lanzetta and Roby (1956a) next 
consider the effect of type of input 
presentation -on efficiency in two 
communication structures employed 
in the previous study: a high de- 
pendence (or low autonomy) condi- 
tion in which each member had to get 
all of the instrument readings neces- 
sary to operate his controls from 
other team members; and a low de- 
pendence condition in which a mem- 
ber had three out of the four neces- 
sary instrument readings available 
in his booth. They varied two aspects 
of the task input: task load (the time 
interval between successive presenta- 


tions of instrument readings) and the 
predictability of the order of presenta- 
tion to the three booths. They find 
again that high dependence gives 
rise to more errors, especially when 
the information has to be relayed 
from several different sources. For 
both structure conditions, errors in- 
crease as the rate of change of instru- 
ment readings increase, but predicta- 
bility of the order of instrument 
changes has no significant effect. 

These findings are further sup- 
ported in another study in which 
Lanzetta and Roby (1957) investi- 
gate learning and the details of 
communication behavior in their 
team situation. They vary depend- 
ence (relaying requirement), task 
load (speed of presentation of input), 
and operating procedure as deter- 
mined by instructions to ‘“‘volunteer”’ 
information or to “‘solicit’’ informa- 
tion. 

In a later study, Roby and Lan- 
zetta (1957) consider the effect of 
“load balancing’’ or distribution of 
work. They used three structures 
that varied the relation between the 
number of instrument displays and 
the number of control switches for 
which the subject was responsible. 
In Structure I (equal observation 
load) a booth had either one, two, or 
three control switches, but it always 
had two displays. In Structure II 
(unequal load) a booth that had one 
control switch had one display; a 
booth with two control switches had 
two displays; etc. In Structure III 
(balanced load) a booth with one 
control switch had three displays; 
a booth with two control switches 
had two displays; etc. The experi- 
mental design is quite complicated 
and confounds the load balancing 
and dependence variables. The au- 
thors, however, conclude that ‘‘both 
load balancing and autonomy are 
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influential but that the latter is more 
heavily weighted in this task” (p. 
174). 

The major accomplishment of Lan- 
zetta and Roby is their introduction 
of controlled, experimentally manip- 
ulable tasks that capture more of the 
characteristics of real-life teams than 
do the earlier Circles and Wheels. 
They have also theorized extensively 
(Roby, 1957; Roby & .Lanzetta, 
1956b, 1958). The real payoff in 
their work will come, however, when 
theory and experimental work merge. 
Their theorizing consists of general 
statements that never arrive at the 
prediction or explanation of specific 
events. Without a theory to generate 
novel and testable predictions, the 
experiments usually establish the ob- 
vious, e.g., if a subject has to check 
with many people before he makes a 
response, he is not likely to complete 
the response in a short time period. 
Although Lanzetta and Roby have 
not completed the merger of theory 


and experiment, they have brought 
them several steps closer together. A 
summary of their experimental find- 
ings is presented in Table 6. 


EMPHASIS ON FEEDBACK 
AND LEARNING 


Rosenberg and Hall have recently 
examined the effects of group struc- 
ture from a different viewpoint than 
Lanzetta and Roby’s. Rosenberg and 
Hall see the composition of informa- 
tion feedback to the individual mem- 
bers as a key aspect of structure. 
They concern themselves, therefore, 
with the relation of structure, defined 
in terms of information feedback, to 
performance. Figure 5 illustrates the 
basic structures they study. S‘ is the 
stimulus which precedes a response 
R is the response, and S‘ is the feed- 
back stimulus, i.e., the state of affairs 
in which the individual finds himself 
after performing the response. In the 
““direct”’ feedback condition the S‘ 
reflects only the subject’s own per- 
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and role differentiation—a function 


formance. With ‘‘confounded”’ feed- 
back the response of one subject 
combines with that of another so that 
his feedback is a function of his 
teammate’s performance as well as 
his own. With “other’s’’ feedback 
the subject receives feedback solely 
from someone else’s performance. In 
order to investigate the relation of 
these structures to performance, Ros- 
enberg and Hall have carried out a 
series of studies using variations of an 
experimental situation similar to that 
of Sidowski, Wyckoff, and Tabory 
(1956). 

In their first study, Rosenberg and 
Hall (1958) ran two-man groups un- 
der the three structures described 
above. The task was to learn to turn 
a knob a required number of turns. 
The amount of error (S‘ value) was 
displayed to the subject after each 
trial. Under direct feedback, each 
subject had to learn to turn the knob 
four times. Under confounded feed- 
back, the two team members had to 
attain a team average of four turns. 
They could reach this average by 
totaling eight turns distributed in 
any fashion between them. Under 
“other’s’’ feedback, the subject had a 
perfect score displayed only if his 
partner turned the knob four times. 
The design of the study permitted the 
evaluation of both the effects of the 
subject’s own feedback condition 
and his partner’s feedback condition 
(which could be different) upon the 
subject’s performance. The depen- 
dent variables were: individual ac- 
team or average accuracy, 


curacy, 


of the absolute difference between 
the response magnitudes of the two 
team members. 

The subjects learn most rapidly 
and to the highest level of proficiency 
under direct feedback. With con- 
founded feedback the subject learns, 
but more slowly and to a lower level 
of proficiency. There is no improve- 
ment in individual accuracy under 
“other’s’’ feedback. The partner’s 
feedback condition has no significant 
effect on the subject’s accuracy. 
With respect to team product, con- 
founded feedback yields team accu- 
racy (average performance) that is 
at least as good as that obtained with 
direct feedback. ‘‘Other’s’’ feedback 
gave clearly inferior team perform- 
ance. In the confounded feedback 
condition, one subject evidently 
learned to make two turns if his part- 
ner persisted in making six turns so 
that both subjects would have an av- 
erage of four. Rosenberg and Hall 
label this compensatory difference 
between response magnitudes, role 
differentiation. They find that 
the confounded feedback conditions 
shows more role differentiation than 
the direct feedback. The ‘‘other’s’’ 
feedback condition, however, shows 
the greatest amount of all. Rosen- 
berg (1959b) also considered the effect 
of switching subjects from a direct 
feedback situation to other structures. 
After the switch, the three structures 
show the same effects as above. 

Hall (1957), using similar appa- 
ratus, investigated two independent 
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variables: type of pretraining, and 
the relative weights assigned to the 
responses of the team members dur- 
ing confounded feedback. He varied 
pretraining conditions by pretraining 
some subjects under direct feedback 
and others under the same con- 
founded feedback conditions they 
received during later trials. The ex- 
perimenter two confounded 
feedback weightings—equal and un- 
equal. In equal weighting, he fed 
back the mean of the two members’ 
responses or 


used 


Rit} Rs 
as in the previous experiments. In 
the unequal weighting, he weighted 
the responses of one member three 
times as heavily as the other, i.e.: 
sf = : Ri + } Re 
The dependent variables were team 
accuracy and role differentiation. 
The feedback weighting conditions 
do not have any significant effect on 
the dependent variables during either 
pretraining or training. In discussing 
the results, Hall emphasizes the com- 
pensatory behavior that occurs in the 
confounded feedback situation. 

Zink (1957) carried out a further 
study in this series using a more com- 
plex task and a different rule for 
determining feedback. Contrary at 
least to the reviewers’ expectations, 
the results indicate greater role differ- 
entiation for the simple task than for 
the complex task. Rosenberg (1959a), 
later tried to produce role differentia- 
tion in Zink’s complex task by pre- 
training the subjects under direct 
feedback. His hypothesis was that 
the subjects had not reached that 
level of proficiency in Zink’s complex 
task to permit them to adjust to a 
partner's behavior. He is, however, 
unable to obtain differences in 
role differentation between subjects 
given different amounts of direct 
feedback pretraining. 


sf = 


In a final set of three experiments, 
Rosenberg (1960) systematically ex- 
plored the effect of various combina- 
tions of feedback weights on team 
performance. He also varied the 
informational content of the feed- 
back by letting some groups know 
only that an error had occurred and 
by informing other groups about both 
the occurrence and the direction of 
the error. On the basis of detailed 
consideration of the effects of feed- 
back upon the response of the sub- 
jects in the various structures, Rosen- 
berg makes predictions concerning 
the development of complementary 
or cooperative behavior. In general, 
he finds that more stable response 
patterns develop as the amount of 
information concerning the direction 
of errors increases. If both subjects 
have a feedback weight of .50 or more 
on their own response, then their 
combined responses tend to stabilize 
at some optimal value, i.e., one in 
which both members receive maxi- 
mum reinforcement. The accuracy 
of the group product is therefore 
maximized. 

With these experiments, this group 
has moved very far from the original 
network studies. In their earlier 
work (Rosenberg & Hall, 1958), 
communication between the team 
members dropped out as an explicit 
independent variable. In the last 
study, amount of reinforcement re- 
ceived replaces group accuracy as the 
dependent variable of primary inter- 
est. The work of Rosenberg and Hall 
has certain basic similarities to the 
work of Lanzetta and Roby. Here 
again the experimenters accomplish 
a very able reduction of the real-life 
team to laboratory proportions. The 
contribution with respect to methods 
is considerable. Here again, however, 
the work generates obvious results. 
The one study (Rosenberg, 1960) 
with novel and systematically related 
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results is one that has moved away 
almost completely from the variables 
of the early studies of group struc- 
ture. The work of Rosenberg and 
Hall is summarized in Table 7. 

It is hoped that as the methods in 
the area are improved, theories which 
can tie together disparate findings 
and generate new predictions will 
develop. Rosenberg and Hall have 
done even more than help prepare 
the methodological groundwork for 
the phase of theorizing that is needed 
now. By reducing social interaction 
to feedback conditions, they have 
prepared the way for an attack with 
the armament of learning theory. 
(This has actually begun in work 
being carried out by Burke, 1959, and 
his associates.) Whether such an at- 
tack can be made without giving up 
the original objective of studying 
group structure remains to be seen. 


SUMMARY AND CONCLUSIONS 


Since the initial stimulus provided 
by Bavelas in 1948 there has been a 
considerable effort spent on the study 
of the effect of structure upon group 
and individual behavior. The main 


original questions posed were: What 
effect does the structure of the group 
have upon the efficiency of its be- 
havior? What effect does position in 
the group have on morale and job 
satisfaction? There is no clear answer 
to the first question. The answer to 
the second question is that central 
positions in general are more satisfied 
with their tasks than peripheral posi- 
tions. 

Later investigators went beyond 
the first two questions to study other 
variables. Heise and Miller intro- 
duced a task complexity variable, 
the condition of communication inter- 
ference (noise), and one-way chan- 
nels. Guetzkow and his collaborators 
introduced the distinction between 
task behavior and organizational ac- 
tivity. Shaw continued the original 
trend of the experimental work and 
also investigated the effects of various 
types of distribution of information 
and task complexity. Christie, Luce, 
and Macy brought mathematics and 
information theory to bear on the 
communication networks. They pre- 
sented the theory of “‘locally rational” 
behavior to explain learning in the 
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networks and differences in perform- 
ance between networks. 

Neither the straight empirical work 
nor the mathematically sophisticated 
analyses have approached the goal, 
implicit in Bavelas’ original ques- 
tions, of a rational system for ar- 
ranging groups to maximize eff- 
ciency and satisfaction. The diffi- 
culties in building such a system 
may stem from the peculiar char- 
acteristics of the Bavelas network 
and the absence of a theory to order 
the data it generated. 

In response to these difficulties, 
more recent investigators have re- 
oriented the work on group structure. 
Lanzetta and Roby have redefined 
structure into terms of direct versus 
indirect accessibility of task informa- 
tion and distribution of task infor- 
mation. Under this type of definition 
they have constructed new types of 
groups and tasks. These investi- 
gators also have made some moves 


toward meeting the need for a theory 
in the area. Rosenberg and Hall have 
attempted to rephrase the problem 
and redesign the experimental setting 
so that learning theory can play the 
organizing role. To do this, they 
define structure in terms of the effect 
of one subject’s responses on another 
subject’s feedback (reinforcement) 
and have studied the effect of various 
feedback arrangements on group 
(dyad) and individual behavior. 

At the present time, there is still a 
major need for a system to order the 
data already obtained and to direct 
further work on the effects of group 
structure. The difficulty in construct- 
ing this system may arise from the 
inappropriateness of either the ex- 
perimental situations or the concepts 
that have been used. Attempts have 
been made to remedy both of these 
possible defects. The success of these 
attempts will determine whether this 
review is a prologue or an epitaph. 
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ON SECONDARY REINFORCEMENT AND 
SHOCK TERMINATION’ 


ROBERT C. BECK? 
University of Illinois 


The concept of secondary rein- 
forcement has been extremely useful 
to psychological theorists and experi- 
ments centered around it have 
brought fruitful and important addi- 
tions to our knowledge of behavior. 
Repeatedly, it has been demonstrated 
in animal experiments that stimuli 
which are paired with food or water 
will gain the power to ‘‘reinforce’”’ be- 


havior—either to sustain old re- 


sponses or fixate new ones. However, 


while reinforcement theorists have 
generally made no distinction be- 
tween the functional properties of 
food and water reinforcement and the 
reinforcement provided by the ter- 
mination of noxious stimuli, almost 
all attempts to use the latter type of 
event as the basis for establishing 
secondary reinforcement have pro- 
duced negative results. Yet, for 
example, when electric st..ck is used 
to motivate and reinforce behavior 
directly, learning is powerful and 
prompt. 

If secondary reinforcement cannot 
be established with the termination 
of a noxious drive, there would seem 
to be little point to using drive reduc- 
tion as the fundamental reinforce- 
ment mechanism in theoretical sys- 
tems which at the same time lean 


' Part of this material was originally pre- 
sented in an unpublished doctoral disserta- 
tion, University of Illinois, 1958. 

? Now at Wake Forest College, Winston- 
Salem, North Carolina. 

The writer wishes to express his apprecia- 
tion to O. H. Mowrer for critical discussions 
and comments on this material and to G. R. 
Grice, L. I. O'Kelly, and W. F. Crowder for 
their able suggestions. 
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heavily on secondary reinforcement— 
as, for example, Hull (1943), Mowrer 
(1956, 1959), and Miller (1951) have 
done. Indeed, the failure to demon- 
strate this phenomenon could be con- 
strued as presumptive evidence in 
favor of the opposite view: namely, 
the argument that secondary rein- 
forcement is a phenomenon involv- 
ing motivational increments, particu- 
larly those related to the stimulating 
properties of the anticipatory goal re- 
sponse. Spence (1956) and Seward 
(1952) have argued for this position. 
“In brief, then, while one would 
hesitate to mention such a will-o-the- 
wisp as a “crucial experiment,’ we 
cannot take lightly Mowrer’s (1959) 
suggestion that the fate of drive re 
duction theory may rest on the suc- 
cessful demonstration of secondary 
reinforcement established in conjuc- 
tion with the reduction of shock or 
some other noxious stimulation. The 
purpose of the present paper is to 
evaluate the evidence on the problem 
with an eye toward finding and/or 
proposing experimental tests of the 
hypothesis that secondary reinforce- 
ment can be so established. 


EXPERIMENTAL EVIDENCE 

For convenience the experimental 
evidence is categorized as follows, ac- 
cording to the method used in testing 
for reinforcement: (a) response acqui- 
sition, including bar pressing, T maze 
learning, head turning, and pushing a 
nose-key; (bd) response extinction; (c) 
delay of reinforcement; (d) response 
facilitation; and (e) preference testing 
(other than T maze). 
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Response Acquisition 


Bar pressing. One of the first to 
claim positive results on this problem 
was Barlow (1952). In the two ex- 
perimental groups of interest here, a 
5-second light either came on (a) in 
the last 5 seconds of a 10-second 
shock, or (6) immediately after the 
shock. After a single such pairing 
each rat was tested 20 hours later, 
with total duration of bar pressing as 
the measure of reinforcement. For 
half the animals in each training 
group the light was continuously on 
and could be turned off by pressing 
the bar, while for the other half the 
light was off and could be turned on. 
The animals in Group a showed no 
significant difference in duration of 
bar pressing, though tending to turn 
the light on more than off. The 
animals in Group 6 showed a signifi- 
cant difference in this same direction. 
These results are very weak support- 
ing evidence and give rise to two 
points of interest. First, secondary 
reinforcement was supposed to have 
been established with a single pair- 
ing. This result may be compared 
with those of Bersh (1951), who 
found that 80 light-food pairings did 
not produce a reinforcement effect 
significantly different from zero pair- 
ings. Second, significant results were 
found for the group in which the 
light came on after shock termina- 
tion, but not for the group in which 
the light preceded it. The light in this 
instance could have come before, or 
have been paired with a discriminable 


drop in drive, thus accounting for the 


apparent backward conditioning. 
But, it would appear from other 
sources that this particular sequence 
might bring about backward aversive 
conditioning, not positive condition- 
ing. Razran (1956) concludes from 
his review of the literature that “with 


shock as the US, backward condition- 
ing seems to be possible only when 
the CS is applied after the shock has 
ceased, and not when it is applied 
during the action of the shock.” 
Mowrer and Aiken (1954) found that 
a signal following immediately upon 
the termination of shock later in- 
hibited a food-reinforced response. It 
is consequently possible that in 
Barlow’s experiment the light was. 
aversive and when the _ subjects 
pressed the bar the onset of the light 
caused them to “‘freeze’’ on it, thus 
producing relatively long durations of 
pressing. Perhaps the duration meas- 
ure is not the most satisfactory index 
of reinforcement. 

Littman and Wade (1955) used a 
tail-shock apparatus to pair a light 
with shock termination. In a differ- 
ent apparatus, with light as the rein- 
forcement for bar pressing, the rats 
did not press more than control sub- 
jects. Deutsch (1956; see also Litt- 
man & Wade, 1956) raised several 
questions about this experiment, the 
most pertinent of which concerns the 
use of a different apparatus for test- 
ing than that used for training. 
Direct evidence regarding the po- 
tency of a secondary reinforcer in 
transituational testing is not plenti- 
ful, and it is probably unwarranted 
to assume secondary reinforcement 
should be obtainable in a situation 
very different from that in which the 
subjects are trained, even with ap- 
petitive reinforcement. 

Beck (1958) trained rats to escape 
from grid shock with a lighted T 
maze door and a tone serving as cues. 
After 180 training trials there were 
two 10-minute test periods during 
which the subjects were locked in the 
choice area of the maze with a newly- 
introduced bar. Shock was on con- 
tinuously during testing and escape 
was not permitted. When the bar was 
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pressed by the subjects in the experi- 
mental group the light and tone came 
on for 2.5 seconds. In the first of two 
replications of this experiment there 
was some indication that the light and 
tone were reinforcing bar pressing, 
but the effect was not strong and did 
not appear in the second replication. 
To further complicate the results, 
animals trained to escape the shock 
without any light or tone made as 
many or more bar presses with these 
as “‘reinforcers’’ as did the main ex- 
perimental group.’ A major difficulty 
in using grid shock is that the animals 
can get “primary” reinforcement by 
hitting upon any movement or pos- 
ture which reduces the pain. Such 
responses can either compete with 
the test response or facilitate it, 
thereby increasing variability and 
possibly eliminating significant differ- 
ences which might otherwise have 
been obtained. 

T maze learning. The paradigm for 


the T maze experiments is funda- 
mentally the same as Saltzman’'s 


(1949) experimental design. After 
rats have been reinforced in a dis- 
tinctive goal box at the end of a 
straight runway, the reinforcing prop- 
erties of the goal box are tested by 
putting it on one arm of a T maze, 
comparing turns to this box by ex- 
perimental and control groups. 
Smith and Buchanan (1954), used 
an amount-of-reinforcement design 
with this paradigm. Omitting the 
various controls, they trained one 
group of rats to run across an electri- 
fied grid to get food in a black goal 
box and across a sponge runway to 


*In this experiment secondary reinforce- 
ment was not shown using the same procedure 
and animals trained with water reinforcement 
and tested under 23 hours’ water deprivation. 
The possible reasons for this, as well as 
pertinent data, are presented at length in the 
original report. 


get food in a white goal box. A 
second group had the color of the 
goal boxes reversed. The goal box 
associated with both food and shock 
reduction should take on greater 
secondary reinforcing capacity than 
the goal box associated with food 
alone. In a black-white discrimina- 
tion situation, with the black goal 
box of a T maze positive, it was pre- 
dicted that the animals previously 
shocked prior to entering the black 
goal box should make fewer errors 
than the group shocked prior to 
entering the white goal box. The re- 
sults bore out the prediction quite 
well. 

In three later experiments with the 
same basic design, Buchanan (1958) 
found that (a) rats would “increase 
their tendency to approach cues con- 
tiguous with escape from a fear-pro- 
ducing situation, as well as those con- 
tiguous with escape from shock’’; 
(6) “the approach tendencies, ac- 
quired by hungry rats during training 
to cues associated with shock reduc- 
tion and hunger reduction, were not 
appreciably affected by changes in 
the drive conditions of hunger and 
fear between training and testing’’; 
and (c) ‘shock reduction and hunger 
reduction were approximately equal 
in their effects on the strength of ac- 
quired tendencies to approach asso- 
ciated cues, and that the drives of 
hunger and shock and/or their re- 
spective incentives combine in some 
fashion in the development of these 
approach tendencies’’ (p. 362). 

In a similar kind of study, Nefzger 
(1957) trained rats to run across a 
grid into a distinctive end box. He 
hypothesized that as training pro- 
gressed, the animals should show an 
increasing preference for this end box 
if it were on one arm of a T maze. He 
recorded no change of preference 
with repeated testing, however, and 
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was unable to duplicate the Smith 
and Buchanan results when response 
elicitation by the goal box itself was 
controlled. 

The problem of controlling for re- 
sponse elicitation during tests for 
secondary reinforcement is important 
for a number of experimental designs 
which utilize the same response in 
both training and testing.‘ By “‘elici- 
tation’’ we refer to the capacity of a 
stimuius to evoke or otherwise exert 
control over a re- 
property 


discriminative 


sponse. The “reinforcing”’ 


of a stimulus refers to its capacity to 
be effective in fixating or otherwise to 
prolong responding in some manner. 
The question raised by an experiment 


such as Smith and Buchanan’s is 
whether the black goal box in the T 
maze is eliciting an earlier-learned 
approach response (the goal box 
being visible from the choice point), 
or whether it is reinforcing some new 
turning response. If we are testing 
the hypothesis that reinforcement 
can be demonstrated in such a situa- 
tion we must accept the eliciting in- 
terpretation over the reinforcing in- 
terpretation in case of any doubt. 
Buchanan, in his later experiments, 
did in fact abandon secondary rein- 
forcement as the sole interpretation 
of his results and referred to the 
“eliciting and/or reinforcing”’ proper- 
ties of the goal box. McGuigan 
(1956) has also discussed this general 
problem in relation to Hull’s treat- 
ment of secondary reinforcement. 
Head turning. Coppock (1950, 
1951) obtained results indicative of 
the establishment of secondary rein- 
forcement, pairing a blinking light 


* Here we are concerned with the logical 
problem of interpreting experimental results 
in an ambiguous situation. In a later section 
we shall consider in detail the evidence re- 
lated to the so-called “discriminative stimu- 
lus hypothesis”’ of secondary reinforcement. 


with tail-shock termination. In his 
tests, whenever the rats had their 
heads turned 22° to a particular side 
they were continuously reinforced by 
the blinking light. The results were 
weakly positive, however, only for 
those animals which (a) had the light 
following shock termination, and (0) 
were reinforced with the head on the 
initially nonpreferred side. The ef- 
fect thus appears to be very unstable, 
if real at all, and the use of the dura- 
tion measure of head position is open 
to the same ambiguity as in Barlow's 
experiment, i.e., the blinking light 
might be producing fearful “‘freezing”’ 
of the head in the position. 

Key-nosing. Crowder (1958) used 
a tail-shock apparatus with rats in 
several different experimental de- 
signs, all of which were characterized 
by very precise control of the shock, 
conducting tests for secondary rein- 
forcement with the shock on, and 
using the pushing of a nose-key 
(similar to a Gerbrands pigeon key) 
in the front of the apparatus as an 
operant response. He found in one 
study that a light repeatedly paired 
with the termination of inescapable 
shock did not later have a significant 
reinforcing effect on the nosing re- 
sponse. 


Response Extinction 


The familiar model for this type of 
design is the comparison of rate of ex- 
tinction with and without the pre- 
sentation of a secondary reinforcer 
folloving responses during extinction. 
Bugeiski’s (1938) experiment is the 
prototype. 

Crowder (1958), with his tail-shock 
and nose-key apparatus, gradually 
increased shock to a maximum in- 
tensity over a 25-second interval. 
The first nosing response after the 
shock reached its peak was immedi- 
ately followed by a 0.5-second pre- 
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sentation of light, then the shock was 
terminated. In both extinction and 
reconditioning the presentation of 
this light as a reinforcer significantly 
increased response rate. However, a 
modification of this procedure in 
which the shock during training came 
on instantaneously to full intensity 
produced completely negative results. 
Crowder’s positive results seem to be 
the best indication of secondary rein- 
forcement among all the studies re- 
viewed, and his technique of gradu- 
ally increasing shock intensity ap- 
pears promising, though possibly 
having little effect other than adapt- 
ing the animals to the pain. 

Murphy, Miller, and Brown (1958) 
studied the extinction of an avoid- 
ance response. During training a 
light followed barrier-jumping re- 
sponses to the shock and the CS. In 
extinction, with only the CS pre- 
sented, it was found that the presen- 
tation of the light after each response 
prolonged extinction very markedly 
and the authors interpreted this to 
mean that secondary reinforcement 
had been demonstrated with pain and 
fear reduction as primary reinforcers. 
On the other hand, we are again faced 
with the awkward problem of back- 
ward conditioning (the light followed 
response and reinforcement during 
training) and the criticisms of Bar- 
low's experiment hold here, also. It 
seems just as reasonable to argue that 
the light had become aversive during 
training and retarded extinction by 
keeping the general level of fear high 
rather than serving as positive rein- 
forcement. A paper by Seeman and 
Greenberg (1955) is directly relevant 
to this experiment, as well as a brief 
report by Bender (1955). 

Beck® trained five rats to escape 


* Unpublished research, University of IIli- 
nois, 1957. 


from shock by running through the 
lighted member of a pair of adjoining 
plexiglass panels. Each animal was 
then put into the apparatus for 15 
minutes with both panels locked and 
dark (no shock). Pushing against one 
of the panels always produced for 0.5 
second the light which had been the 
positive stimulus, but escape was not 
permitted. Pushing against the other 
panel did not produce any illumina- 
tion change. As predicted, (a) the 
subjects pushed the panel on the rein- 
forced side significantly more than on 
the nonreinforced sidie, and (d) the 
percentage of total responses to the 
reinforced side was significantly 
greater than that of a control group 
trained without a cue stimulus. It is 
still possible, however, that when the 
light came on it was eliciting further 
responses rather than reinforcing a 
certain position habit. 


Delay of Reinforcement 


In a fourth experiment, Crowder 
(1958) used a light to bridge a delay 
between the nosing response and 
shock termination. Shock came on 
immediately at full intensity, then 
when the rat pushed the nose-key 
there was a 2-second onset of light, 
following which the shock was ter- 
minated. Eighty-five trials with this 
procedure did not result in signifi- 
cantly shorter latencies in occurrence 
of the response after shock onset than 
did a 2-second delay of reinforcement 
without the light. Both conditions 
were vastly inferior to immediate 
reinforcement. 


Response Facilitation 


Lee (1951) taught three groups of 
rats to bar press for food, then asso- 
ciated a tone with shock in a tail- 


shock apparatus. In one group the 
tone was associated with the onset of 
shock, with the termination of shock 
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in a second group, and with no shock 
at all in a third. These groups were 
now tested by pairing the tone with 
the previously-learned 'bar pressing 
response. According to his hypoth- 
esis, the group having tone associated 
with shock termination should press 
the most; the group without tone- 
shock experience an intermediate 
amount; the group with tone asso- 
ciated with shock onset should press 
the least. As it happened, pairing the 
tone with shock termination inhibited 
responding as much as pairing it with 
shock onset. 

Mowrer and Aiken (1954) obtained 
results similar to Lee’s. They paired 
a blinking light with shock onset and 
termination in various temporal se- 
quences and found that after the light 
had been contiguous with shock ter- 
mination (either just before or just 
after) its presentation inhibited bar 
pressing for food reinforcement. 
Mowrer and Aiken suggest that the 
light did not facilitate bar pressing 
because the animals were not afraid 
at the time of testing, i.e., the rele- 
vant motivating condition was not 
operative at the time of testing. 
Preference Testing 

With this paradigm, a distinctive 
environment is associated with escape 
from shock. This environment is 
then matched with some other stim- 
ulus context and the subjects’ prefer- 
ences are recorded. 

After training their animals to es- 
cape from shock by running to a non- 
shock escape chamber, Goodson and 
Brownstein (1955) found that the 
escape chamber was preferred to 
either the shock compartment or a 
neutral compartment. Again, how- 
ever, the test situation was so con- 
trived that the results can be inter- 
preted in terms of the elicitation of 
the previously learned escape re- 


sponse. During training the animals 
learned to run away from the shock 
box and into the escape box. Both of 
these aspects of running were rein- 
forced by shock termination. In test- 
ing, the animals were put into an 
alleyway between two closed doors. 
These doors were simultaneously 
raised so that the animal was faced 
with the same situation encountered 
in training: an open door into the 
escape chamber, with all its eliciting 
cues, was present. The response 
scored, running into the escape com- 
partment, was exactly the same re- 
sponse on which the animal had been 
trained. 

Montgomery and Galton (1956) 
eliminated the testing ambiguity 
found in the Goodson and Brown- 
stein experiment, but introduced an- 
other confusion. After placing a rat 
in a small plexiglass ‘trolley car,” 
in one apparatus compartment shock 
was turned on and remained on while 
the animal was pulled into a second 
compartment, where it terminated. 
After a number of such trials the 
animal was put into the two-com- 
partment situation for unrestricted 
running and time spent in the two 
compartments was recorded. Un- 
fortunately, the fact that the sub- 
jects preferred the side where shock 
terminated can be interpreted to 
mean that they were avoiding the 
fear-arousing side. This is a phenom- 
enon so well-established that there is 
no necessity for talking about second- 
ary reinforcement. 

A preference-testing 


experiment 
which would give clear-cut results 
would combine the acquisition pro- 
cedure used in the Montgomery and 
Galton experiment and the fest pro- 
cedure of the Goodson and Brown- 


stein experiment. Since in Mont- 
gomery’s procedure the animals are 
transported from one compartment 
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to another no particular running re- 
sponse is learned: hence, such a habit 
could not manifest itself during test- 
ing and be confounded with a demon- 
stration of secondary reinforcement. 
Goodson’s test procedure of pairing 
a neutral box with both the shock box 
and escape box in preference tests 
should show whether the animals are 
simply avoiding the shock compart- 
ment or have learned a preference for 
the escape side. Gleitman (1955) re- 
ports a study which is, in principle, 
the same as this ‘‘ideal’’ experiment. 
Rats were placed in a transparent 
cable car with a grid floor. Shock was 
turned on in one part of the experi- 
mental room, continued while the 
rats were transported via cable to an- 
other part of the room, then turned 
off. The subjects were divided into 
three groups for the testing of pref- 
erences and it was found that the 
termination place was preferred to 
the shock-onset point, there was no 
preference between a neutral place 
and termination point, and there was 
no preference between a neutral place 
and shock-onset locus. The ambigu- 
ity in these results is the failure to 
show a clear approach or avoidance 
tendency in the groups having the 
neutral place as a choice. Two pro- 
cedural aspects of the experiment 
which might weaken any interpreta- 
tion placed on it are the facts that 
(a) because the animals were run in 
an ‘“‘open’’ room there may have been 
present what Mowrer (1959) has 
called “ambiguous cues,” stimuli 
associated with both the onset and 
termination of shock; and (8) there 
were 11 experimenters, students in an 
experimental laboratory. As we shall 
see later, however, even a perfectly 
controlled experiment with this de- 
sign might fail to give positive re- 
sults, for it may not be possible to 
establish secondary reinforcement 


without the animals making a dis- 
criminative response during training. 


Summary 


The experimental literature shows 
a variety of tests of the hypothesis 
that the termination of aversive 
stimulation can be used as the pri- 
mary reinforcer in establishing a 
secondary reinforcer, but there are 
few positive results. In those in- 
stances where there has been a clear 
experimental effect the interpreta- 
tion is generally confounded such 
that the concept of secondary rein- 
forcement need aot be invoked. Only 
one experiment (Crowder) seems to 
be unambiguously positive, with a 
highly significant effect, but in view 
of his own negative results in other 
experiments, as well as the rest of the 
literature, this does not provide an 
undue amount of faith that the 
phenomenon exists. We must now 
ask whether this predominantly nega- 
tive evidence is sufficiently strong to 
refute the theoretical positions which 
predict that secondary reinforcement 
should be established under such con- 
ditions or whether the explanations 
for the experimental failures are to be 
found in the experiments themselves. 
Toward this end a consideration of 
two variables related to secondary 
reinforcement becomes appropriate: 
namely, discrimination training and 
motivation. 


SECONDARY REINFORCEMENT 
AND DISCRIMINATION 
TRAINING 


Discriminative Stimulus Hypothesis of 
Secondary Reinforcement 


One of the difficulties involved in 
trying to interpret the experimental 
literature in the previous section was 
the lack of differentiation between 
the eliciting and reinforcing functions 
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of stimuli. There, we considered only 
the logical problem that in many 
situations the repeated occurrence of 
a response could be attributed to 
its elicitation by some previously- 
learned cue and that the concept of 
secondary reinforcement was there- 
fore superfluous. The black goal box 
visible from the choice point of a T 
maze is a case in point. The question 
to be considered now is somewhat 
different, to wit: what is the empirical 
relationship between the eliciting 
and reinforcing functions of stimuli? 
Specifically, can a stimulus serve as a 
secondary reinforcer without first 
having cue function? 

Keller and Schoenfeld (1950, p. 
236) have stated this discriminative 
stimulus hypothesis in firm terms: 
“In order to act as an S* for any re- 
sponse a stimulus must have status as 
an S” for some response.’’ This cue 
function is established through a 
process of differential reinforcement, 
reinforcing a response in the presence 
of a stimulus and not in its absence. 
The importance of such a procedure 
in discrimination training has been 
shown by Ferster (1951), for example, 
who found that a stimulus continu- 
ously present during both reinforce- 
ment and nonreinforcement did not 
have the properties of a discrimina- 
tive stimulus, i.e., had no control over 
behavior. In order to clarify the na- 
ture of the establishment of a dis- 
criminative stimulus, its use as a 
secondary reinforcer, and the nature 
of the problem of the interaction of 
these two properties, a brief review 
of the typical Skinner-box training 
procedure is in order before examin- 
ing experimental results. 

The procedure is generally as fol- 
lows: (a) An animal is trained to eat 
from a food magazine. (08) It is 
trained to eat in the presence of a cer- 
tain stimulus, such as a light, or im- 


mediately following some stimulus, 
such as the click of the food delivery 
mechanism. Such stimuli are referred 
to as discriminative stimuli (symbo- 
lized by S”) and are the same as the 
positive stimuli in any discrimination 
training situation. When the dis- 
criminative stimulus is not present, 
nor has just occurred (depending on 
the kind of training situation), the 
animal is not reinforced for going to 
the magazine. (c) A bar is introduced 
for the animal to press. As soon as it 
is pressed, S” follows immediately 
and the animal can go to the maga- 
zine and eat. The animal is then ex- 
tinguished on bar pressing, with or 
without the S” following the response, 
and the occurrence of S” is found to 
increase resistance to extinction. As 
an alternative procedure the animal 
may originally learn to press the bar 
with only the S” as reinforcement. 
Under either of these conditions the 
discriminative stimulus is referred to 
as a secondary reinforcer. 

Most of the experimental tests of 
the discriminative stimulus hypothe- 
sis have been positive. Schoenfeld, 
Antonities, and Bersh (1950) found 
that the mere temporal contiguity of 
a stimulus with some reinforcer was 
not sufficient to establish this as a 
secondary reinforcer (compare with 
Ferster above, also). In their study, 
a light was associated with the con- 
sumption of food pellets, but not with 
obtaining them. After 100 such asso- 
ciations, the light did not increase the 
rate of bar pressing above operant 
level. 

Dinsmoor (1950) studied the dis- 
criminative stimulus—secondary re- 
inforcement relationship with an ex- 
tinction procedure. After training 
rats on bar pressing, half were ex- 
tinguished with the presentation of 
S° as reinforcement following re- 
sponses. For the other half, the bar 
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was removed and the experimenter 
presented the S° without food the 
same number of times that it had 
occurred in the first group. When the 
bar was again made available to the 
second group it was found to have ex- 
tinguished on bar pressing as much as 
the first group, i.e., response rate was 
reduced the same amount. Extinc- 
tion was carried further, with ‘“‘cue”’ 
and “reinforcing’’ functions of the S° 
reversed for the two groups and no 
differential rate of extinction was 
found now either, a clear demonstra- 
tion of the intimacy of these two prop- 
erties of stimuli. Coate (1956) repli- 
cated part of Dinsmoor’s experiment 
with the same results. 

Notterman (1951) studied second- 
ary reinforcement as a function of 
amount of discrimination training. 
Interspersing a varying number of 
nonreinforced trials in which S” was 
absent among a constant number of 


sé 


reinforced trials in which S” was pres- 
ent, this investigator found that the 
more strongly the discrimination was 
thus developed the greater secondary 


reinforcing power the S” had. Mc- 
Guigan and Crockett (1957, 1958), 
Webb and Nolan (1953), and Wike 
and McNamara (1956) report similar 
results. 

These experiments indicate, then, 
that (a) with better discrimination 
training secondary reinforcement is 
stronger, and (5) when either the cue 
or reinforcing function of a stimulus 
is extinguished, the alternate func- 
tion also declines. The rationale for 
maintaining the distinction then is 
the way in which the stimulus is used, 
its temporal relationship to a particu- 
lar bit of behavior. Keller and 
Schoenfeld have clearly made this 
point, also. Contrary to these posi- 
tive results, however, there are three 
reports of experiments which appar- 
ently contradict the hypothesis. 


Rozeboom (1957) was unable to 
replicate the Dinsmoor-Coate re- 
sults, and even got little depression of 
bar pressing after pairing the S” with 
shock during the phase when the re- 
sponse to the S” was being extin- 
guished. He does not report any data 
from the extinction period, unfortu- 
nately. In both Rozeboom’s experi- 
ment and Wyckoff’s (reported below) 
a somewhat unusual procedure was 
used. Rather than food reinforce- 
ment, water reinforgement was used, 
delivered by ney cn which was 
normally up and lowered into a reser- 
voir for water at the appropriate 
time. Rozeboom’s latent extinction 
procedure involved having the dipper 
mechanism operate without water 
during the cue extinction period. The 
subjects could still make the licking 
response to the dry dipper during the 
latent extinction period. 

Wyckoff, Sidowski, and Chambliss 
(1958) trained their rats to approach 
and lick the dry dipper when a buzzer 
sounded, whereupon the dipper de- 
livered water. After this training, a 
bar was inserted into the side of the 
box opposite the dipper and animals 
for whom the buzzer was contiguous 
with bar presses failed to make more 
responses than animals for whom the 
buzzer sounded automatically follow- 
ing 10 seconds of no bar pressing, the 
dipper no longer operating in either 
case. While the authors themselves 
felt ‘“‘no inclination to reject the con- 
cept of secondary reinforcement,” 
they did believe that some crucial 
condition in the establishment of 
secondary reinforcement remains un- 
specified. In this experiment, the 
buzzer was not directly associated 
with water presentation and a con- 
summatory response (licking water 
from a dipper), but instead was 
paired with an operant response 
(licking dry dipper), which was simi- 
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lar to the consummatory response. 
Perhaps the crucial difference sought 
for by Wyckoff is related to this fact 

a suggestion made promising by 
Rozeboom’s negative results with the 
dry dipper technique. 

On the basis of the Wyckoff study, 
Myers (1958) suggests that many bar 
pressing experiments thought to have 
shown secondary reinforcement may 
have really shown only heightened 
activity following the presentation of 
S°. When this was controlled, ‘‘sec- 
ondary reinforcement”’ no longer ap- 
peared. Direct experimental evidence 
in support of this contention (which 
Myers did not present) can be found. 
Both Walker (1942) and Estes (1948) 
found that when an S® established 
under Condition X was periodically 
presented under Condition Y it in- 
creased the rate of a response with 
which it had never before been di- 
rectly associated. Gilbert and Sturdi- 
vant (1958) report similar results. It 
would not be surprising then to find 
that an S” would facilitate a response 
in the same situation where it had 
been associated with that response. 
Zimmerman (1957, 1959), however, 
contrary to Wyckoff, has obtained 
literally thousands of responses from 
his animals with nothing but second- 
ary reinforcement. In the same kind 
of control group as used by Wyckoff 
responding was very low, no more 
than operant level. It would seem 
valuable to repeat the Wyckoff ex- 
periment to ascertain just what vari- 
ables are operating. Wyckoff himself 
seems not to be dismayed by it all, for 
he has since attempted to develop a 
quantitative theory of secondary 
reinforcement (1959) using ‘“‘cue 
strength” as the main variable in- 
fluencing the secondary reinforcing 
capacity of a stimulus. In any event, 
these results would not seem to in- 
fluence the interpretation of experi- 


ments using such apparatuses as 
straight runway or T maze, where 
trials are spaced. 

The third contradictory report is 
Ratner’s (1956). After training his 
rats to approach a hopper at the 
sound of a click he introduced a bar 
into the box. Animals for whom the 
bar pressing was followed by the 
click made more presses than a no- 
click group, but did not go to the 
hopper more often. Ratner suggests 
that although the click was reinforc- 
ing it was not an S? for goal-ap- 
proaching because the animals went 
to the goal box only a small propor- 
tion of the time that the click was 
presented (about 20% on the first 
day). Often, however, either rats or 
pigeons will not go to the goal after 
every response on a schedule of 100% 
primary reinforcement. In fact, one 
of the things we expect a reinforcer to 
do is to “strengthen a habit”’ so that 
the habit will maintain itself without 
external reinforcement. In addition, 
in Ratner’s situation other cues, such 
as the sight or sound of the food being 
delivered, may also have been impor- 
tant as S” in control of goal-ap- 
proaching and these may have been 
absent during testing. This briefly 
reported experiment is suggestive, 
but inconclusive. 

In view of the total evidence, it 
seems that something about the na- 
ture of discrimination training is im- 
portant for the establishment of 
secondary reinforcement. While we 
cannot here go into the problem of 
the possible underlying mechanisms 
we are inclined to take the evidence 
at its face value. Since it may be pos- 
sible to obtain secondary reinforce- 
ment without prior discrimination 
training and an S” does not seem to 
be guaranteed as a reinforcer we can 
not accept cue function as prima 
facie evidence for secondary re- 
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inforcement. It seems clear, how- 
ever that such training generally 
does make secondary reinforcement 
stronger and to this extent the S” 
hypothesis may be considered as 
“correct,” providing us with some 
empirical basis for analyzing the 
failure of some of the shock-termina- 
tion experiments. 


Discrimination Training and the 


Shock-Termination Problem 


By and large, the shock-termina- 
tion experiments have been based on 
a Hullian-type assumption that the 
sufficient condition for establishing 
secondary reinforcement is pairing a 
neutral stimulus with some “‘pri- 
mary” or other “secondary” rein- 
forcer (see McGuigan, 1956). There 
is no specific statement of prior dis- 
crimination training in this assump- 
tion (although Hull has written of the 
eliciting properties of the secondary 
reinforcer), and in none of the experi- 
ments thus far reported, save those of 
Smith and Buchanan and of Beck, 
has there been any attempt to use dis- 
crimination training. Therefore, it 
seems that the majority of negative 
and questionable results cannot 
necessarily be considered as evidence 
that the phenomenon does not exist, 
or as clearly opposing drive-reduc- 
tion theory. Rather, they can be con- 
sidered as evidence against only a 
particular statement as to how sec- 
ondary reinforcement can be estab- 
lished in conjunction with drive-re- 
duction reinforcement, i.e., that the 
simple pairing of a neutral stimulus 
and drive reduction is sufficient. If 
this statement is incomplete, and the 
evidence indicates that it is, then the 
experiments have hardly touched 
upon the main problem we are con- 
sidering—whether secondary rein- 
forcement can be established at all 
using shock termination as primary 


reinforcement. Granting that the 
procedures used with food or water 
may not necessarily be correct for 
use with the termination of noxious 
drives, they still provide the best di- 
rection for research. 


SECONDARY REINFORCEMENT 
AND MOTIVATION 


Recalling Mowrer and Aiken's sug- 
gestion that perhaps they failed to 
obtain secondary reinforcement be- 
cause their animals were not appro- 
priately motivated during testing, 
we look back over the other experi- 
ments and see that only in Crowder’s 
and Beck’s experiments were the 
animals shocked during testing. On 
the other hand, in experiments with 
hunger and thirst the subjects are 
tested for secondary reinforcement 
under the same deprivation condi- 
tions with which they were trained. 
The exception to this occurs, of 


course, in the few experiments which 


have studied secondary reinforce- 
ment as a function of motivation. 
Brown (1956) found in tests for 
secondary reinforcement that there 
was no interaction between hunger 
level and amount of responding in re- 
inforcement and nonreinforcement 
groups. The secondary reinforcement 
group responded equally more than 
the control,group at both “high” and 
“low” drive levels. She suggests that 
satiated animals getting the second- 
ary reinforcer might very likely have 
not given any indication of effective- 
ness of the reinforcing stimulus, but 
her low-drive group was not satiated. 
Miles (1956) obtained similar re- 
sults. After training his rats on bar 
pressing under 23 hours’ food dep- 
rivation he gave them extinction 
testing under 0, 2.5, 5, 10, 20, and 40 
hours of deprivation. At each drive 
level the experimental group was 
superior to a comparably trained and 
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deprived control group which did not 
get the secondary reinforcer. Like 
Brown, he found no regular trend for 
the difference between experimental 
and control groups to increase as a 
function of deprivation time, al- 
though the three shortest deprivation 
groups showed less difference than 
the three highest. The experimental- 
control differences were not signifi- 
cant at the shorter deprivation in- 
tervals, but the overall functions 
were. 

Oakes (1956), on the other hand, 
did find an interaction. Varying S? 
presentation and food deprivation 
time in a factorial design, he found 
that both of these variables influence 
straight runway performance. His 
high-drive group with cue reinforce- 
ment ran faster than either the low- 
drive group with cue or the high-dri-"e 
group without cue reinforcement. 

Wike and Casey (1954) claim te 
have demonstrated the secondary 
reinforcing property of food for 
satiated animals, finding that the 
satiated animals which got food 
(which they did not eat) in the goal 
box ran a straight runway faster than 
rats which did not get pellets in the 
goal box. Unfortunately, as these 
writers themselves report, there was 
no control for the effect of simply 
manipulating something in the goal 
box, for example, nonedible objects. 

Schlosberg and Pratt (1956) got 
very marked results indicating the 
difficulty of demonstrating secondary 
reinforcement with satiated subjects. 
Rats under 23 hours’ deprivation 
showed a consistent preference for the 
side of a T maze where they could see 
and smell food, but not eat it. When 
run while satiated the preference re- 
duced to chance, only to return im- 
mediately to its former high level 
when the rats were again deprived. 
Rats initially run in the maze while 


satiated showed only chance prefer- 
ences, and when switched to depriva- 
tion took as long to display the 
preference as did the first group, run 
deprived from the start. The authors 
conclude that hunger was necessary 
for both learning and maintaining the 
preference. 

In a recent study Grice and Dyal® 
gave three groups of rats 110 click- 
food pairings per animal while the 
subjects were under 23 hours’ food 
deprivation. After this training a 
bar was introduced into the appara- 
tus and the animals were tested for 
30 minutes. There was a very signif- 
icant reinforcement effect between 
a 23-hour deprived-click-reinforce- 
ment group and a 23-hour no-click 
group, the means being 56.12 and 
17.12. However, a satiated-click 
group gave almost twice as many re- 
sponses as the hungry-no-click group, 
a mean of 30.88. This suggests that 
while secondary reinforcement may 
be obtainable with satiated subjects, 
it is certainly more powerful with de- 
prived. 

Seward and Levy (1953) obtained 
results difficult to interpret. Rats 
given food reinforcement on one side 
of a T maze continued to show a 
preference for this side when run 
satiated. On the other hand, with 
repeated training and testing they 
showed no increasing preference for 
the food side as would be expected if 
secondary reinforcement were operat- 
ing. 

These various experiments on the 
relationship of drive level and sec- 
ondary reinforcement, while few in 
number and sometimes contradic- 
tory, strongly suggest that any such 
effect obtained with satiated animals 
would be weak. Only Grice and 


* Unpublished research, University of IlIli- 
nois, 1957. 
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Dyal obtained a difference even close 
to significance with satiated animals, 
excepting the doubtful results of 
Seaward and Levy and of Wike and 
Casey. If we now assume that mo- 
tivation is an important variable and 
look at the problem of the shock- 
termination experiments again we 
first have to determine the condition 
equivalent to “satiation.” 


Satiation in Shock Experiments 


As the term is generally used, 
“satiation” refers to the absence of 
some motivating condition, of the 
“complete gratification’’ of a need. 
In practice, this means that we have 
induced an animal to eat or drink as 
much as it will so that we are rea- 
sonably assured that it does not 
“‘need” food or water. In the shock 
situation satiation should then refer 
to the absence of shock. There is 
another component of shock situa- 
tions, however, namely, fear. Thus, 
even though shock were not present 
while testing in the same apparatus 
used in training there might be con- 
siderable motivation and we would 
not think of the subjects as satiated. 
Whether this fear component would 
be a powerful enough motivator to 
demonstrate secondary reinforcement 
clearly, assuming its demonstrability, 
would presumably be dependent 
upon such parameters as strength 
aad number of shocks during train- 
ing. Since Miller (1951) has shown 
that animals will learn a variety of 
responses with escape-from-fear rein- 
forcement, we might expect that this 
same motivation would be potent 
enough to demonstrate secondary re- 
inforcement. Schoenfeld (1950) be- 
lieves that it can be so demonstrated, 
having hypothesized that in avoid- 
ance learning the animals continue 
to make the avoidant response be- 
cause the proprioceptive stimuli as- 


sociated therewith have taken on 
secondary reinforcing properties. 

We might get satiation on the 
other hand, if we tested the subjects 
in a completely different apparatus, 
as, for example, in the Littman and 
Wade experiment. Mason,’ how- 
ever, has made the suggestion that in 
this case the stimulus which was 
supposed to be reinforcing might 
have the opposite effect and arouse 
fear, inhibiting responses on account 
of its prior association with the 
shock situation. This is an especially 
interesting argument, for its im- 
plication to the drive reductionist is 
that a signal associated with shock 
termination will arouse drive in a 
nonshock situation and reduce it 
when the organism is pained or fear- 
ful. One would not then expect to 
get positive results in a situation 
like that of Littman and Wade, but 
would expect results like those of Lee 
and of Mowrer and Aiken. 


Effects of Very Strong Shock 
During Testing 


Thus far we have been considering 
the effects of very low motivation or 
complete absence of motivation dur- 
ing tests for secondary reinforce- 
ment. What about the converse, can 
there be too much motivation during 
testing? None of the hunger or 
thirst experiments concerned with 
secondary reinforcement and drive 
have apparently had the subjects too 
highly motivated, but shock can 
easily produce an excitation level far 
beyond that of appetitive drives and 
has been shown to have ill-effects on 
performance (for example, as far back 
as Yerkes & Dodson, 1908). How 
does this deleterious effect of very 
strong motivation apply in the shock 


7D. J. Mason, Personal communication, 
1956. 
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situation we have been considering. 

Competing responses. As men- 
tioned previously, shock may arouse 
responses in competition with or 
facilitating the response being meas- 
ured. The combination of these op- 
posite effects in a single experiment 
would tend to wash out experimental 
differences. 

Psychophysics of drive-reduction re- 
inforcement. A second problem may 
be even more serious since it strikes 
at the nature of the mechanism of 
secondary reinforcement. Let us as- 
sume momentarily that the basis of 
secondary reinforcement is some 
form of anticipatory drive reduction. 
Campbell and Kraeling (1953) have 
shown that the effectiveness of shock 
reduction as a reinforcer is a func- 
tion of the proportion of the total 
shock reduced, not just the absolute 
amount of reduction. This approxi- 
mates a Weber fraction, which Camp- 
bell (1955) has shown even more 


clearly with sound reduction. If, 
then, an animal is being shocked dur- 


ing a test for secondary reinforce- 
ment, the amount of anticipatory 
drive reduction induced by a sec- 
ondary reinforcer could well be less 
than the differential threshold for re- 
inforcement. What might potentially 
be a ‘“‘good’’ secondary reinforcer 
could reduce such a small propor- 
tion of the total drive in this situa- 
tion as to be ineffective. We could 
therefore tell if the reinforcement 
“threshold” were reached only if re- 
inforcement were demonstrated. If 
it were not demonstrated one could 
argue that the threshold had not 
been reached and the hypothesis was 
not disproved at all. The way to 
break out of this circularity would 
seem to be a careful study of a variety 
of shock and/or fear levels. The a 
priori selection of a shock level for 
testing would not seem to be ade- 


quate, even though the test shock 
were the same as in training, because 
the termination of a given shock in- 
tensity might be an effective rein- 
forcer for discrimination training 
but the shock inappropriate for con- 
tinual use during tests for secondary 
reinforcement. 


Motivation During Training 


The role of motivation during dis- 
crimination training is not con- 
sidered in detail because one can 
observe and measure discrimination 
performance with enough accuracy 
to tell when a discriminative response 
has been well-learned. There is also 
good evidence that shock intensity is 
relatively unimportant if the dis- 
crimination to be learned is simple 
(for example, Hammes, 1956). 

Summarizing these various lines of 
evidence, then, it may be suggested 
that some amount of aversive mo- 
tivation will probably be necessary to 
demonstrate secondary _reinforce- 
ment derived through association 
with pain reduction. The precise 
level is a matter to be determined 
empirically, but intensities either too 
high or too low may negate other- 
wise positive results. 


SoME SUGGESTED EXPERIMENTAL 
APPROACHES 

Only Crowder, to the writer’s 
knowledge, has really attempted the 
“obvious” experiment, using the de- 
sign that Bugelski used over 20 years 
ago—extinguishing a response with 
and without the use of an hypothe- 
sized secondary reinforcer. Crowder’s 
results suggest that this might be a 
fruitful approach, especially since he 
apparently did get positive results 
even without discrimination train- 
ing. A powerful procedural advance 
would be an extensive use of partial 
reinforcement, particularly the meth- 
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od that Zimmerman (1957, 1959) has 
reported for use with food and water 
reinforcement. With this technique, 
the S” follows bar pressing only part 
of the time, and reinforcement fol- 
lows the S” only a fraction of the 
time that it occurs. Using grid shock 
in conjunction with a discrimination- 
training procedure and bar press 
training, one might produce satisfac- 
tory secondary reinforcement during 
extinction with the cue stimulus as 
the reinforcer. 

Fear motivation might be even 
more effective than direct shock, for 
many of the problems encountered 
with grid shock could be avoided. 
Suppose that we put a rat into a 
shock compartment with a hinged 
door in one of the walls. We block 
off this door and shock the animal 
severely, 4 la Miller. Now we turn 
off the shock and make the door 


available, gradually training the ani- 
mal in repeated trials to push open 


the door and run out of the shock 
compartment toa safe compartment. 
We run 10 such trials a day, shock- 
ing the animal before training each 
day to insure that the level of fear 
is high. We now introduce an S®, 
such as a buzzer. The rat is put in 
the box and the panel is locked, not 


to be opened until the buzzer sounds.. 


Soon the animal learns not to push 
the door until the signal is presented. 
Then, 4 la Zimmerman, we put this 
panel pushing on a partial schedule, 
such that when the buzzer sounds 
the animal does not always get to 
escape when he pushes on cue. 
Rather, the buzzer ceases when the 
animal responds, then comes on 
again after another minute or so. 
We slowly build up this ratio so that 
the buzzer sounds several times be- 
fore the animal finally is reinforced. 
Now comes the test period. Every- 
thing is the same on this day except 


that there is a bar in the box, which 
when pressed sounds the buzzer for 
a period of time, but the animal is 
not allowed to escape. We can thus 
test the reinforcing capacity of the 
buzzer in the same manner that we 
test for secondary reinforcement with 
appetitive motivation. In the same 
analogous manner we could have in- 
troduced the bar during training it- 
self, then tested for secondary rein- 
forcement by presenting the buzzer 
(but no escape) during extinction. 
The critical aspects of either of 
these designs, in view of the foregoing 
discussions, are that (a) the buzzer 
is established as a discriminative 
stimulus, contiguous with escape 
from a fear-arousing situation, and 
(6) the motivational conditions dur- 
ing testing are the same as those dur- 
ing training. 

There are a variety of aversive 
stimulus situations which could be 
used as alternatives to shock and 
fear. In a report concerning the use 
of cold stimulation and heat rein- 
forcement, for example, Carlton and 
Marks (1957) report that it was very 
difficult to establish a stable bar 
pressing rate unless a cue stimulus 
preceded the onset of the heat. They 
interpret this to mean that the cue is 
serving as a secondary reinforcer. 
While this may not be a stringent 
test of reinforcement, the technique 
does seem readily amenable to more 
direct tests. One might use the 
cessation of strong light or sound as 
a reinforcer in the same way. Aijr 
deprivation and reinforcement would 
provide an interesting test, but an 
extinction procedure with the sub- 
jects deprived would rapidly become 
confounded. 


CONCLUSIONS 


This review of the experimental 
literature leads us to conclude that 
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there is almost no evidence to show 
that secondary reinforcement can be 
established by the association of a 
neutral stimulus with noxious-drive 
reduction. An analysis of the experi- 
ments suggests that they have not 
been completely adequate for a 
variety of reasons, depending upon 
the particular experimental designs 
used. Generally speaking, there have 
been three major problems. First, 
there has often been a lack of cer- 
tainty whether stimuli were eliciting 
previously-learned responses, or re- 
inforcing them during tests for re- 
inforcement. Second, in almost none 
of the experiments has the secondary- 
reinforcer-to-be first been established 
as a cue, although the literature 
strongly suggests that this procedure 
is advisable. Third, there has been 
relatively little consideration of the 
role of motivation during tests for 
secondary reinforcement. 

It would seem that a first step in 
attacking this problem is to design 
experiments which provide maximal 


opportunity for the phenomenon to 
be demonstrated. It should be de- 
termined whether secondary rein- 
forcement can be established at all, 
using methodologically sound and 
unambiguous procedures, before go- 
ing on to test different hypotheses 
about the establishment of secondary 
reinforcement. Until this is done it 
seems meaningless to use the negative 
results thus far obtained as evidence 
against a concept as general as drive 
reduction. In the event that the 
phenomenon can never be demon- 
strated we may have a finding detri- 
mental to drive-reduction theory, 
but certainly not to reinforcement 
theory since there are a number of 
alternative explanations for the op- 
eration of reinforcement. Reinforce- 
ment theory may be forced into the 
acceptance of some kind of hedonic 
axiom, however, and agree with P. T. 
Young that getting rid of something 
bad is not the same as getting some- 
thing good. 
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The repeated measurements anal- 
ysis of variance designs have been 
popular in psychological research for 
a number of years. The advantage of 
these designs has been mainly that 
of economy, relative to number of 
subjects (Ss), but increased precision 
may result; the experimental error is 
reduced when variance due to Ss is 
removed. The simplest design of this 
nature is that of the treatments X 
subjects design in which nm Ss receive 
all of & treatments. More complex 


designs involve the latin square and 
modified latin squares. 

Whenever repeated measurements 
designs have been used the procedure 


has usually been to counterbalance 
the order of appearance of the treat- 
ments so as to avoid any practice or 
learning effect which may be present. 
In the simple case this involves hav- 
ing all Ss take the treatments in one 
order and then reversing the order on 
subsequent trials (intrasubject coun- 
terbalancing), or some Ss take the 
treatments in one order and other Ss 
receive different orders of presenta- 
tion (intersubject counterbalancing). 
A combination of these two proce- 
dures probably is used most frequent- 
ly. However, the practice effect is 
not partitioned in these analyses. In 
the more complex designs, the in- 
vestigator is able to separate a source 
of variation due to practice or order. 
In some cases the effect due to the 
order by treatment interaction can 
also be partitioned. 

The repeated measurements de- 
signs have been considered by nu- 
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merous individuals (e.g., Alexander, 
1947; Gaito, 1958a, 1958b; Garrett 
& Zubin, 1943; Grant, 1944, 1948; 
Gourlay, 1955; Hilgard, 1951; Ko- 
gan, 1948, 1953; Lindquist, 1947, 
1953; Lubin, 1954, 1957, 1958; Mc- 
Nemar, 1951, 1955; Peters, 1944). 
Likewise, the criticisms directed at 
these designs have been numerous. 
The major indicated defect in the 
simple design is that the treatment 
effect will be confounded with any 
practice effect or, if counterbalancing 
is used, the main effects will be bal- 
anced but any practice effect will 
appear in the interaction effect, thus 
producing a negative F test bias 
(Type Il error). The use of latin 
squares has been criticized because 
it has been maintained that interac- 
tions must be zero for valid use of 
this design. However, papers by 
Gourlay (1955) and Gaito (1958b) 
have indicated that this assumption 
is not always required. The latter 
individual employed the expected 
value of mean square [E(MS)] con- 
cept and showed that the important 
consideration as to the suitability of 
the latin square model depends on 
the number of random variates in- 
cluded in the experiment. Work by 
mathematical statisticians (Wilk & 
Kempthorne, 1957) has also indi- 
cated that interactions do not neces- 
sarily have to be zero. 

The overall problem of repeated 
measurements designs is a complex 
one, and a satisfactory treatment has 
not been effected. However, the 
E(MS) concept (Anderson & Ban- 
croft, 1952; Cornfield & Tukey 
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1956; Greenwood, 1956; Kemp- 
thorne, 1952; Wilk & Kempthorne, 
1957) provides a suitable technique 
for a clear investigation of this prob- 
lem. The purpose of this paper is to 
extend this approach to a number of 
repeated measurements designs. To 
investigate adequately this problem 
we shall treat six cases which are 
most frequently used: (a) all Ss re- 
ceive the Treatments (7) in the same 
Order (O); (6) the Order of Treat- 
ments is randomized for each S; (c) 
the Order is balanced (assumption 
that all interactions containing order 
are zero); (d) the Order is balanced 
and analyzed as a single latin square 
(no assumptions about interaction) 
(e) the Order is balanced without 
interaction assumptions but ana- 
lyzed by a modified latin square de- 
sign (e.g., Lindquist Type II design) ; 
and (f) the Order is balanced without 
assumptions but analyzed as a simple 
Treatments X Subjects design. 
REPEATED MEASUREMENTS 
DESIGNS 


Case I. Same Order 


This represents the simplest type of 


repeated measurements design. All 
n Ss receive the k Treatments in the 
same Order. Table 1 indicates the 
E(MS). The rule for obtaining the 
E( MS) in a complete factorial design 
is as follows: E( MS) is o2 (variance 
due to error) plus the o* term whose 
subscript corresponds to the main 
or interaction effect of concern. It 
also includes all o* terms which rep- 
resent interactions with this main or 
interaction effect, providing the ef- 
fects not included in the main or 
interaction effect are all random. For 
example, in a two-variable design 
(A XB) in which A is a random ef- 
fect, the E(MS) for B would be «2 
+o;7+0¢."; for A, ¢7+¢,? (see An- 
derson & Bancroft, 1952; Cornfield & 


TABLE 1 

COMPONENTS OF VARIANCE INCLUDED IN 

MEAN SQUARES FOR CASE I: 
OrpER E¥Fect PRESENT 





T of +seet+ou*+se,* 
Ss oe + to? 
TS oft+oi* 





Note.—In Tables 1-6 all main effects are fixed except 
S, which is random. 


Tukey, 1956; Greenwood, 1956; 
Kempthorne, 1952; Wilk & Kemp- 
thorne, 1957). The coefficient for 
¢2 is 1; all other o? terms have coeffi- 
cients which are equal to the number 
of replications (m) times the number 
of the levels of the variables which 
are not included in the subscript of 
the o? under consideration.2, How- 
ever, because of confounding aspects 
other components will be included in 
some mean squares. These can be 
determined intuitively. 


2 The most general treatment of coefficients 
for a complete factorial design is by Cornfield 
and Tukey (1956). They use 1—x/X as 
coefficients for ¢,2 where x refers to the num- 
ber of replications and X is the total popula- 
tion. 1—x/X is also used as a coefficient for 
x in o* due to interactions if x is not involved 
in the mean square in question. These coeffi- 
cients serve to suppress terms completely 
when they are fixed effects (if x=X, then 
coefficient is zero). If x is very small and X is 
infinite or very large, then the coefficient goes 
to 1. These coefficients also reduce the o* 
terms (are between zero and 1) when the 
populations are finite but larger than the 
samples. We shall not be concerned with this 
latter situation. Thus in Table 1 the coeffi- 
cient for o,f throughout the table and for 
ow? in Tis 1—s/S. Inasmuch as the sample of 
Ss is small whereas the population S is 
usually very large, the coefficient becomes 1. 
The coefficient for a; in S is 1 —t/T; however, 
t=T because we have all Treatment levels in 
our experiment. Thus the coefficient is zero 
and o;,,* vanishes. The coefficients for o;,* in 
TS is 1 because 1—t/T does not appear for 
either the T or S portions. Both are involved 
in the 7S mean square. The coefficient for 
¢,* in T is 1—s/S, which becomes 1. All co- 
efficients are multiplied by », which in our 
example is 1. 
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In Table 1 Subjects represents a 
random variate and thus ¢;,? appears 
in the Treatments effect. No Order 
XTreatment effect, or any inter- 
action containing Order, is present 
because only one order is involved.* 
Furthermore, ¢,? and o,* are con- 
founded and cannot be separated. 
The coefficient for ¢,? is the same as 
for ¢7. Of course, if ¢,*=0 then the 
test of T by 7S is a valid one. If a,? 
is not zero then positive bias occurs 
in the F test of 7, a tendency for too 
many significant effects being re- 
ported (Type I error). 


Case II. 

Because we suspect the presence of 
an Order effect (and interactions in- 
volving this effect) we decide to ran- 
domize the Order of presentation of 
the Treatments to each individual 
separately. The result of this pro- 
cedure is indicated by Table 2. In 
this case ¢,? has been removed from 
T. Any effects of Order or any inter- 
action will appear in ¢7 and, thus, be 
felt in all effects, so that the F test of 
T will be a valid one. However, o2 
will be inflated if the Order or inter- 
action effects are present. 


Case III. Balanced Design—lInter- 
actions Zero 

This situation represents the 
limited example which has been con- 
sidered in detail by Gaito (1958a). 
Hilgard (1951) and Lindquist (1953), 
as well as others, have been concerned 
with this case. Because of possible 
practice effects we have one or more 
Ss take one Order, one or more other 


Randomization of Order 


* Even though we speak of Order effects and 
Order interactions, it is actually trial effects 
and trial interactions which are involved, be- 
cause differences between trials, or differential 
effects of trials for different Ss, indicate that 
the different Orders do not give the same re- 
sults. However, it is usual to speak in terms 
of the former. 


TABLE 2 


COMPONENTS OF VARIANCE INCLUDED IN 
MEAN SQUARES FOR CAsE II: 
RANDOMIZATION OF ORDER 





of +soP?t+onu® 
oe +to,? 
eo? +o:,7 





Ss take another Order, etc., but as- 
sume that all Order interactions are 
zero. Table 3 indicates that ¢,? now 
appears in 7S, thus making for nega- 
tive bias in the F test of JT. The 
balancing procedure equalizes the 
various levels of each main effect but 
not the interaction components. If 
more than one factor is included in 
the experiment, the levels of all main 
effects and interactions not involving 
the random effect are equalized, but 
the interaction levels including the 
random effect are not. Furthermore, 
the magnitude of g,? inflation tends to 
increase with increasing order of in- 
teraction (e.g., mean square of 7,7,5S 
is greater than 7,S or 7,5). 

Case IV. Single Latin Square—ZInter- 
actions Present 


The single latin square design has 
been used infrequently in recent years 
in psychology, possibly because of the 
criticisms of Lindquist (1953) and of 
McNemar (1951) concerning the fre- 
quent presence of interactions. This 
design (in which each S has a differ- 
ent Order) has been considered by 
Gaito (1958b) as the One Random 


TABLE 3 
COMPONENTS OF VARIANCE INCLUDED IN 
MEAN SQUARES FOR Cass III: 
BALANCING OF ORDER BUT No 
ORDER INTERACTIONS PRESENT 





T of +soet+ont 
S of+te? 
TS of +o17+s02 








REPEATED MEASUREMENTS DESIGNS 


TABLE 4 


COMPONENTS OF VARIANCE INCLUDED IN MEAN SQUARES FOR Case IV: 
SINGLE LATIN SQUARE ANALYSIS WITH NO ASSUMPTION CONCERNING INTERACTIONS 


of ttefetoe tou? + (1 —2/b) ose? 
oe +tot +o? +27 +(1 —2/t)oete* 
eo +to?+o17+ (1 oe 1/t) oto" 
of+ Ct toi +Os0 + (1 —2/t) Tato? 


~ =— 
O 
S 


Residual 





Note.-— Because o ={ =s, the coefficients for all main effects are given as ¢. 


Variate Model. That article also 
deals with the Zero, Two, and Three 
Random Variates Models as well. 
The rule for the complete factorial de- 
sign presented above must be modi- 
fied to deal with this incomplete 
factorial design. The above rule is 
applied first. Then the following 
additions are included. Residual con- 
tains all interactions, and each main 
effect is confounded with the triple 
interaction and the double inter- 
action containing the other two 
effects. The paper by Wilk and 
Kempthorne (1957) presents a gen- 
eralized derivation for latin square 
designs and the coefficients for each 
o? term in Table 4 are based on that 
derivation. o/ and all interaction o’ 
terms except ¢,,.” have a coefficient of 
1. The coefficient of o? for each main 
effect is ¢. The coefficient for o,,.? in 
the Residual and the two fixed effects 
(T and O) is 1—2/t; in S the coeffi- 
cient gets closer to one (1—1/?) inas- 
much as the random effect is in- 
volved. 

In this case the F test of T is nega- 
tively biased unless ¢,,? is zero. Even 
though the F test is unbiased when 
7... is zero it is not a valid F test be- 
cause it is not distributed as the F 
distribution. A valid F test requires 
that the interactions in the mean 
squares of both the main effect and 
the Residual must be random, nor- 
mally distributed, and be a compo- 
nent that would be expected in the 
mean square as indicated by the rule 


above. If these conditions are not 
satisfied the result is a ratio of two 
noncentral chi square statistics di- 
vided by their respective degrees of 
freedom, and the distribution de- 
pends upon the parameters of un- 
wanted components, in the present 
situation g,,2 and ¢,,.2. For a valid 
and unbiased F test, ¢,.?, ¢”, and 
Oso" Must be zero. 


Case V. Lindquist Type II Design 
Interactions Present 


This situation is the same as in 
Cases III and IV except that groups 
of Ss take each Order; also we allow 
all Order interactions to be present 
and analyze the results as a Lindquist 
Type II design (Table 5). This de- 
sign is actually a modification of the 
single latin square design and the 
arguments presented above for the 
One Random Variate Model are per- 
tinent here. The Residual contains 
¢.* and all possible interactions except 
0..°, which has been removed. Each 
main and interaction effect contains 
o.2, variance due to itself, and the 
interaction of the effect with other 
effects which are random. Further- 
more, because of the confounding 
aspects of the latin square each main 
effect includes variance due to the 
other two factors and variance due 
to the triple interaction. In this de- 
sign if only two Treatments and two 
Orders are involved the TXO(w) 
effect disappears (Lindquist, 1953). 

The F test of T and TXO(b) will 
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TABLE 5 


COMPONENTS OF VARIANCE INCLUDED IN MEAN SQUARES FOR CASE V: 
BALANCING OF ORDER, ALL INTERACTIONS PRESENT, AND ANALYZED 
AS Linpgutst Type II DesiGn 


Between Ss 
Groups or TX0O(d) 
Between Ss within Groups 


Within Ss 
lreatments 
Order 
T XO(w) 
Residual 


' 


6 +e? +s 'tor(b) + (1 —1/b orto? 
of +te?+(1 = 1, t) ote? 


2 +so2 tou +o + (1 —2/t) este? 
G8 Hho? +10? +o? + (1 —2/b ote? 
G2 +8’ or8(w) + (1 —2/borte?® 
of +o," +¢,.7°+ (1 —2 Leste? 


Note.— Because o =/, ¢ is used as the coefficient for both «,* and «,*. s’ refers to the number of Ss in each group, 


s to the number of Ss in the experiment. 


not be biased even if all interactions 
are present. However, the F test of 
TXO(w) will be negatively biased. 
The unbiased tests will not be dis- 
tributed as F because of nuisance 
parameters, e.g., in Treatments, the 
g,.° and o*. Thus the Type II 
“mixed” design, which is one of a 
number of designs which Lindquist 
recommends for counterbalancing 
purposes (1953, p. 163, Ch. 13) ap- 
pears to give unbiased (but nonvalid) 
results for the main effects, when all 
interactions are present. 

The advantage of the Type II de- 
sign is that it allows for a separation 
of both the Order and the Treatments 
<Order effects. However, if the 
latter present the test of the 
Treatments effect may not be mean- 
ingful, even though unbiased. If 
Order were a random effect, then the 
test of the Treatments effect is mean- 
ingful. However, usually Order rep- 
resents a fixed effect. Thus if the 
interaction is of a “reversal’’ type 
(i.e., one Treatment is most effective 
with Order of presentation 
whereas other Treatments are more 
effective with different Orders), an F 
test of T would be meaningless. How- 
ever, in a “continuous spread” type 
of interaction (i.e., the rank order of 
the Treatments are the same for all 


1S 


one 


Orders but the difference between 
Treatments varies with the Order of 
presentation), a generalization based 
on the F test of T would be meaning- 
ful. 

The Type II design represents one 
of a large number of “mixed’’ de- 
signs. Readers interested in the 
E(MS) for more of these should con- 
sult Harter and Lum (1955). 


Case VI. Balanced Treatments X Sub- 
jects Design—JInteractions Present 


Let us take the same procedure as 
in Case V but analyze the results as a 
simple Treatments X Subjects design. 


This result is indicated in Table 6. 
The E(MS) for T is the same as in 
Table 5; S contains all the between- 
subjects variance terms of that table; 
and 7S contains the O, TX O(w), and 
Residual components. Note that 
¢.2 is contained in 7S as was indi- 
cated for Case III. The reader should 
note also that 7S is the same in 
Tables 3 and 6, except that in the 
latter table are included Order inter- 
action o? terms while in Table 3 these 
are missing. For the E( MS) of Table 
3 it was assumed that Order inter- 
actions were not present. 

As is obvious from Table 6, the F 
test of T will be negatively biased be- 
cause two unwanted components, 
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TABLE 6 
COMPONENTS OF VARIANCE INCLUDED IN MEAN SQUARES FOR Case VI: 


BALANCING OF ORDER AND ALL ORDER INTERACTIONS PRESENT 





oo +soP tou +o? + (1 —2/t) ost? 
of tte? +s'tow?(b) +(1 —1/t)oete® 
G2 tha? +o +50? +5010" (w) +(1 —2/t)oere® 


o.2 and Gio w), Will be included in the 
denominator. The defects occurring 
in this situation are more severe than 
in the above cases. 


DISCUSSION 


From the six cases presented above 
it is obvious that the possible defects 
which may occur in repeated mea- 
surements designs are extreme. It 
would appear that if one does have a 
repeated measurements design, the 
safest procedure would be to random- 
ize the order of treatments so that 
order and all interactions containing 
order would be included in ¢? and 
appear in all effects, unless he has 
strong reasons for believing that cer- 
tain interactions are not present. 
However, one might use a Lindquist 
Type II design. In the former design 
unbiased and valid F tests of the 
treatment effect are obtained. In the 
latter design unbiased tests of the 
treatment effect are obtained but 
these tests are not distributed as the 
F distribution and will not be mean- 
ingful if a ‘‘reversal”’ type interaction 
between order and treatments has 
occurred. 

All of these counterbalancing ex- 
amples have been of an intersubject 
nature. However, the results of 
intrasubject counterbalancing would 
be similar. For example, if intrasub- 
ject counterbalancing were used such 
that each S would receive two or 
more sequential orders (e.g., if two 
treatments, the Ss would take only 
the ABBA or BAAB orders), the o.? 


would be confounded with ¢/ in the 


treatments effect. If inter- and intra- 
subject counterbalancing were to be 
employed, some Ss would receive two 
or more orders of presentation while 
other Ss would receive some reversal 
of these orders (e.g., if two treat- 
ments, some Ss would have the 
ABBA sequence while others would 
have the BAAB sequence). In this 
case if a subjects Xtreatments anal- 
ysis is followed and a practice effect 
is present which is constant from 
trial to trial for all Ss, no bias occurs 
in either the main effects or the inter- 
action; however, the within-cells term 
will be inflated. If the practice effect 
is not constant from trial to trial, and 
is either the same or not for all Ss, 
then inflation will occur in both the 
interaction and within-cells terms. 
The above considerations should 
make one cautious concerning the 
use of a repeated measurements de- 
sign. However, only the effects of 
order and interactions have been dis- 
cussed. There is another source of 
contamination in the repeated mea- 
surements designs, i.e., correlated ob- 
servations. It has been assumed by 
many investigators that by partition- 
ing a source of variation attribu- 
table to Ss, the problem of correla- 
tion has been handled. That this as- 
sumption is not true has been 
indicated by a number of people (e.g., 
Box, 1954; Danford & Hughes, 1957; 
Geisser & Greenhouse, 1958; Lubin, 
1954, 1957, 1958; Scheffe, 1956). 
Box (1954) indicates that when 
there is moderate correlation within 
rows (in psychological experiments 





52 JOHN GAITO 


the row variable would represent Ss), 
a great distortion occurs in the prob- 
ability levels for between-rows com- 
parisons but little distortion is intro- 
duced for between-columns compari- 
sons. The maximum correlation that 
Box studied was +.40. In the case of 
the negative correlation the percent 
probability for the test of columns 
(Treatments) was 5.90 rather than 
5.00 (which would result when corre- 
lation is zero); for positive correla- 
tion the percent probability was 
6.68. Box makes use of an approxi- 
mate technique in which the degrees 
of freedom are reduced by multiply- 
ing each df by a fraction, epsilon (e), 
which depends on the correlation 
within-rows. The upper limit of ¢ is 1, 
which will occur only if the variances 
are equal and the correlation is con- 
stant among the Treatments. In this 
case the F ratio with the usual df can 
be used. In the event that just two 
treatments are involved, ¢€ equals 1 if 


the variances are equal. However, in 
many designs using three or more 
treatments, ¢€ will be less than 1; thus 
if the usual df are employed (without 
reduction by e) an increase in Type I 
errors will occur. 


Geisser and Greenhouse (1958) 
have extended Box’s result to de- 
velop a conservative F test of treat- 
ments. They show that e2>(k—1)~™ 
and thereby determine the lower 
limit for the df to be 1/n—1, where k 
refers to the number of treatments 
and m is the number of Ss. (This re- 
sult can be obtained by multiplying 
the df for treatments (k—1) and for 
treatments Xsubjects [(&—1)(m—1)] 
each by 1/k-—1.) Thus the F test 
with df of 1 and m—1 can be em- 
ployed when unequal covariation 
occurs with one group of Ss. They 
also develop a conservative test when 
more than one group is involved. In 
this case the df for the approximate 


F test of treatments is 1/N—g, where 
N is the total number of Ss and g is 
the number of groups. However, the 
authors maintain that the use of the 
lower limit may be too conservative. 

Danford and Hughes (1957) argue 
for the use of the usual analysis of 
variance design, maintaining that the 
equal covariance assumption (con- 
stant correlation) is tenable for cer- 
tain experimental situations. They 
state that some experimental data 
have shown comparable correlation 
coefficients (r’s of .70 to .90).4 They 
criticize Scheffe’s (1956) suggestion 
to use Hotelling’s 7* statistic for 
testing the fixed main effect because 
of the above. Likewise, they indicate 
that if the equal covariance assump- 
tion is correct the power of the usual 
F test is greater (in some cases, much 
greater) than is the power of Hotel- 
ling’s test. 

Lubin (1954, 1957, 1958) has co- 
gently considered the repeated meas- 
urements designs, not only 
sidering the effects of correlated ob- 
servations but also treatment X order 
interactions, and other learning or 
“carry over” effects. Because of these 
contaminating effects, he recom- 
mends the use of a modification of 
Hotelling’s 7* test, or a nonpara- 
metric rank-order test if one is inter- 
ested in the relative efficacy of sev- 
eral treatments (unless a treatment 
Xorder interaction is present). If 
this interaction is present he advo- 
cates a matched Ss design in which 
each S receives only one treatment. 

Thus the F test is theoretically 
correct only if constant correlation 
among treatments is present. If only 
two treatments are involved, and 
homogeneity of variance is present, 
then it follows that the F 
always appropriate. If unequal corre- 


con- 


test is 


* The experimental data of concern are not 
cited, however. 
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lation occurs, too many significant 
Fs will be reported. With moderate, 
but unequal, correlation among treat- 
ments, the increase in number of sig- 
nificant results reported for treat- 
ments effects appears to be small, 
using Box’s approximation. The in- 
crease with greater correlation is un- 
known. However, the F test indi- 
cated by Geisser and Greenhouse 
allows one to make a conservative 
test. 

In conclusion, it is apparent that 
multiple defects are present in re- 
peated measurements designs. The 
design using randomization of the 
order of treatments avoids the num- 
erous defects but ¢ may be quite 
large. With randomization the corre- 
lation effect should be minimized. 
The Lindquist “mixed” design over- 
comes some of the defects but the F 
test of treatments, even though un- 
biased may not be a valid F test and 


may be meaningless. The matched 


Ss design recommended by Lubin 
would appear to be the safest pro- 
cedure if enough Ss are available. 


However, the important point to 
stress is that if an investigator resorts 


to a repeated measurements design he 
should be aware of possible distor- 
tions which may occur and be able 
to defend his assumptions concern- 
ing the order effect, the order inter- 
actions, and the correlated observa- 
tions. 


SUMMARY 


Six types of analysis of repeated 
measurements designs are indicated. 
The effects of order, interactions con- 
taining order, and correlated obser- 
vations on the components of vari- 
ance and analysis of variance tests of 
significance are considered. The first 
two act, in general, to inflate the error 
estimates and thus to increase the 
probability of a Type II error. The 
correlated observations (if unequal) 
have the opposite effect, i.e., increase 
the probability of a Type I error. It 
is suggested that caution be exercised 
in the use of these designs; randomi- 
zation of the order of treatments or 
matched subjects appear to be the 
safest procedures. The Lindquist 
Type II “mixed” design overcomes 
some defects but is not completely 
appropriate. 
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HUMAN TRACKING BEHAVIOR’ 


JACK A. ADAMS 
University of Illinois 


The subject matter of this paper is 
a critical review and analysis of re- 
search, issues, and points of views 
associated with human behavior in 
one- and two-dimensional tracking 
tasks. Tracking tasks have never 


been given explicit definition and one 
of the purposes of this paper is to ten- 
tatively advance the general bounds 
of a tracking situation, but for those 


who are unfamiliar, a temporary 
working definition for the moment 
which enjoys the consensus of most 
psychologists is as follows: 


1. A paced (i.e., time function) externally 
programed input or command signal defines a 
motor response for the operator, which he per- 
forms by manipulating a control mechanism. 

2. The control mechanism generates an out- 
put signal. 

3. The input signal minus the output signal 
is the tracking error quantity and the opera- 
tor’s requirement is to null this error. The 
mode of presenting the error to the operator 
depends upon the particular configuration of 
the tracking task but, whatever the mode, the 
fundamental requirement of error nulling al- 
ways prevails. The measure of operator pro- 
ficiency ordinarily is some function of the 
time-based error quantity. 


The usual tracking task has a vis- 
ual display but there is no necessity 
for this. On occasion, auditory track- 
ing tasks have been devised (Forbes, 
1946; Humphrey & Thompson, 1952a, 
1952b, 1953). The most simple and 


1 This research was supported by the 
United States Air Force under Contract No. 
AF 49(638)-371, monitored by the Air Force 
Office of Scientific Research of the Air Re- 
search and Development Command. 

A number of psychologists read this manu- 
script in draft form and contributed to its 
improvement. The detailed and critical com- 
ments of F. C. Bartlett, E. A. Fleishman, 
C. B. Gibbs, N. B. Gordon, J. A. Leonard, and 
A. T. Welford were particularly appreciated. 


well-known visual tracking task is 
the Rotary Pursuit Test (Melton, 
1947) which employs a repetitive in- 
put signal and, although investiga- 
tions using the Rotary Pursuit Test 
are not ordinarily included in that 
body of research which is considered 
to study tracking behavior per se, it 
is nevertheless an unequivocal ex- 
ample of the breed. Tracking studies 
typically use more elaborate appa- 
ratus which allows for controlled 
manipulation of such variables as 
the function for the input signal, 
scale factors, mathematical trans- 
formations of the output signal, 
characteristics of the control mecha- 
nism, etc. 

While investigations of tracking 
behavior might legitimately be sub- 
sumed under the time-honored rubric 
“motor skills,’’ this label is misleading 
in hinting by implication and text- 
book tradition that motor behavior, 
tracking, is disassociated 
from so-called “higher processes.” 
British investigators in particular 
have analyzed the acquired ability to 
predict input stimulus sequences as a 
key intervening response class in de- 
termining the proficiency of the meas- 
ured motor responses in tracking 
tasks, thus emphasizing the interlac- 
ing of “‘higher”’ and “‘lower’’ processes. 
These British studies will be discussed 
in detail later, but passing mention of 
them at the onset seems worthwhile 
for establishing the archaic connota- 
tions of “motor skills.’’ Research by 
Adams (1957), Fleishman (1954, 1957a, 
1957b, 1958), and Fleishman and 
Hempel (1954, 1955, 1956) on vari- 
ables influencing individual differ- 
ences in motor behavior, also docu- 


such as 
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ments the inherent complexity of the 
response totality elicited in motor 
tasks. Helson (1949), in discussing 
variables the subject's 
standard of excellence in a tracking 
task, includes perceptual and moti- 
vational states in addition to motor 
factors as influential determiners of 
motor behavior. 


influencing 


Basic TERMINOLOGY AND FRAME 
OF REFERENCE 


Independent variables influencing 
tracking behavior will be divided into 
two classes: task variables and pro- 
cedural variables. Task variables are 
machine-centered. They are the 
physical values of the tracking de- 
vice, and they include such factors as 
the nature of the input signal, con- 
figuration of the display, design of 
the control system, mathematical 


transformations relating control dis- 
placement and changes in the output 


signal, etc. Procedural variables are 
man-centered. They are manipulable 
nontask quantities, and examples of 
them are instructions, number of 
practice trials, length of the practice 
trial, and time between trials. Also, 
the indicants which are displayed to 
the operator will be implicitly as- 
sumed as simple elements, such as 
needles or dials, pointers, dots on 
cathode ray tubes, etc. Special prob- 
lems that arise when the display is 
perceptually complex and requires 
the interpretation of forms, shapes, 
colors, etc. will be ignored. 


THE TRADITION OF ENGINEERING 
PSYCHOLOGY 

A dominant influence in tracking 
research is the experiments of engi- 
neering psychology, with the em- 
phasis being largely on the relations 
between measures of tracking be- 
havior and fask variables. The engi- 
neering psychologist has as his goal 
the prediction of the characteristics 
of man-machine systems, and this 
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goal requires careful attention to the 
task variables which influence the 
operator. Representative examples 
of several hundred task-oriented track- 
ing experiments are studies of control 
loadings (Bahrick, 1957; Bahrick, Ben- 
nett, & Fitts, 1955; Bahrick, Fitts, & 
Schneider, 1955; Briggs, Bahrick, & 
Fitts, 1957; Howland & Noble, 1953; 
Weiss, 1954), input signal character- 
istics (Hartman, 1957; Hartman & 
Fitts, 1955; Noble, Fitts, & Warren, 
1955), the magnitude of lag between 
control movement and system out- 
put (Conklin, 1957; Warrick, 1949), 
the effects of visual noise (Briggs & 
Fitts, 1956; Briggs, Fitts, & Bahrick, 
1957), mathematical transformations 
of the output signal (Briggs, Fitts, & 
Bahrick, 1958), and compensatory 
vs. pursuit tracking (Chernikoff & 
Taylor, 1957; Poulton, 1952b). Task 
variables, because of their role in de- 
termining the behavioral require- 
ments for the operator, are an im- 
portant class of variables for psy- 
chology and engineering psycholo- 
gists have made a notable contribu- 
tion in directing attention toward 
neglected determiners of human be- 
havior. However, this strong task 
orientation has led to the neglect of 
procedural variables that influence 
the operator, and thus the efficiency 
of the total man-machine system. A 
recent article (Taylor, 1957) has 
clearly stated this emphasis: 

. .. human engineering aims first at building 
better systems and secondarily at improving 
the lot of the operator. Thus, whereas con- 
ventional psychology, both basic and applied, 
is anthropocentric, human engineering is 
mechanocentric (p. 252). 


This statement succinctly summa- 
rized the task-oriented approach of 
engineering psychology and expresses 
a downgrading of procedural vari- 
ables related to training, retention, 
fatigue, motivation, etc. It is for- 
gotten, or intentionally neglected, 
that the engineering psychologist 
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Fic. 1. The analogy commonly drawn between a closed-loop electromechanical servo- 
system and a human operator as an error-nulling agent in a tracking task. 


must, over the long run, develop the 
capability to predict the effectiveness 
of a man-machine system for different 
states of the operator, and this means 
a strict scientific accounting of a 


broad range of variables which in- 


fluence man. There are few who un- 
derestimate the importance of task 
variables in determining the behavior 
of a man-machine system, but there 
seems to be no sound justification for 
relegating procedural variables to a 
secondary status. In the beginning, 
an applied branch of a science might 
profitably concern itself with rank 
ordering its variables in terms of their 
potency in influencing a criterion 
(Taylor & Garvey, 1959), but this 
approach does not deserve being 
elevated to a research philosophy. 
Sophisticated applied science, just as 
sophisticated basic science, must work 
toward a precise accounting of all 
variables and their interrelations. 
Whereas general experimental psy- 
chology has often looked to tradi- 
tional behavioral theory as a basis for 
its tracking studies, many engineer- 
ing psychologists, with their mecha- 
nocentric views, have turned towards 
the feedback theory of closed-loop 


servomechanisms (Bower & Schul- 
theiss, 1958; Brown & Campbell, 
1948; Goode & Machol, 1957) as a 
model for a man-machine tracking 
system. Basically, a closed-loop 
servosystem is an electromechanical 
error-nulling system which compares 
an input signal with an output signal 
and works toward reducing the differ- 
ence between them. Because error 
nulling is a basic characteristic of 
systems which include the human 
operator as a tracking component, 
some engineering psychologists view 
physical servotheory as a potential 
source of descriptive relationships 
for manual tracking systems. Figure 
1 shows the parallel that is ordinarily 
drawn between a servosystem and a 
man-machine tracking system. The 
theory of servomechanismsisa method 
of mathematical analysis concerned 
with the description of the output of 
a complex system as a function of the 
input, and it allows the system's 
analyst to state the functional char- 
acteristics of his system with some 
precision. The expression of the in- 
put-output relations is by means of a 
complex ratio called the transfer 
function which expresses the nature 
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of the transformations that the sys- 
tem imposes on the input signal. Ina 
system comprised of a number of 
components, a transfer function is 
determined for each component and 
these can then be combined to yield 
an overall transfer function for the 
system. An important feature of 
these methods of system analysis is 
that is not necessary to painfully 
trace the signal through each element 
of a component to compute the input- 
output transformation represented 
in the transfer function. Rather, a 
“black box” approach can be taken 
where input-output relationships are 
directly compared without attending 
to the many intermediate transforma- 
tions which occur to the signal as it 
passes through the component. 

The servosystem analyst is con- 
cerned with input-output relations as 
they are manifested in two domains: 
time and frequency. In the time 


domain the time-varying character- 
istics of the system are described in 
terms of overshooting, undershooting, 
oscillations, steady state errors, etc., 


in response to a step input. In the 
frequency domain the output of the 
system is examined for transforma- 
tions of a sinusoidal input after the 
transients have died out. Finally, 
and perhaps most importantly in this 
brief exposition on the methods of 
servosystem analysis, is that the en- 
tire mathematical structure is found- 
ed on the assumption of linearity. 
Fundamentally, this assumption means 
that the system obeys the superposi- 
tion theorem which states that the 
system response to the sum of a set of 
inputs is equal to the sum of the re- 
sponses made to each input sepa- 
rately. This means that the perform- 
ance of the system can be predicted 
for any complex input providing we 
know the response of the system to 
each of the constituent inputs com- 
prising the complex input. Another 
implication for the linear assumption 
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is that it will accurately reproduce 
input sinusoidal frequencies after 
transients have died out, although 
there may be phase shift and ampli- 
tude change. Furthermore, it is 
implicit that the output of the system 
is solely a function of the input and 
this functional relationship is de- 
scribed by the transfer function—.e., 
for example, it is not a function of 
such variables as time where the 
system might perform one class of 
transformations on the inputs at 
time ¢ and another class at a later 
time. 

Ellson’s paper (1949) best expresses 
the hope of some engineering psy- 
chologists that the transfer function 
for the human operator might be 
determined and provide an analytical 
means of predicting the performance 
of the total man-machine system, 
and of optimizing the performance of 
the system by designing hardware 
components to complement the re- 
sponse characteristics of man. This 
goal of mathematically describing 
the characteristics of man and his 
machine elements is scientifically ad- 
mirable but, regrettably, it was 
doomed from the beginning by the 
massive barrier of the linearity as- 
sumption. Almost self-evident is the 
fact that the human operator is a non- 
linear component of a systein with 
his intricate adaptive propensities 
toward learning, fatiguing, motiva- 
tional shifts, etc. and that there is 
faint possibility of finding the trans- 
fer function which can be used by 
system designers to optimize the per- 
formance of a system by capitalizing 
on the transformations that man im- 
poses on a signal as it enters the 
receptors, makes passage through the 
organism, and is emitted anew by the 
responding effector system (Birming- 
ham & Taylor, 1954; Ellson, 1949; 
Fitts, 1951; Searle & Taylor, 1948). 
Birmingham and Taylor (1954) have 
nicel, expressed this matter of non- 
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linearity for the tracking human 
operator: 

This adaptability on the part of the man 
is, of course, a great boon to the control de- 
signer, since he can rely upon the human to 
make the most of any control system, no mat- 
ter how inadequate. It is this which probabl 
constitutes the most important single reaso: 
for using men in control loops. Yet, this very 
adjustability renders any specific mathemati- 
cal expression describing human behavior in 
one particular control loop quite invalid for 
another man-machine arrangement. This sug- 
gests strongly that ‘the human transfer func- 
tion”’ is a scientific will-o’-the-wisp which can 
lure the control system designer into a fruitless 
and interminable quest (p. 1752). 

Fitts (1951) has reported on certain 
limited conditions where human re- 
sponse appears to approximate linear- 
ity but, in general, it would seem 
that the nonlinearities of human be- 
havior negate the usefulness of the 
servomodel and its mathematical 
techniques as a serious theoretical 
instrument for behavior theory or as 
a tool for the design of man-machine 
systems. Nonlinearities do, of course, 
occur in some physical systems but 
the assumption of linearity is met 
sufficiently well and often to make 
the theory of important value for the 
physical sciences. This could hardly 
be said for psychology where non- 
linearities are an inherent, and in- 
deed the most interesting and chal- 
lenging, aspect of the human operator. 
It must be concluded therefore, that 
present-day servotheory stands in an 
analogous, not a scientific, relation- 
ship to man-machine tracking sys- 
tems. 

Even if analytical methods even- 
tually become available to handle the 
nonlinearities of closed-loop human 
behavior, it is unlikely that engineer- 
ing psychology will be able to make 
effective use of them if it continues its 
preoccupation with task variables 
(Taylor, 1957; Taylor & Garvey, 
1959) and underplays the role of 
procedural variables which are basic 
determiners of dispositional states of 


the operator and contribute sub- 
stantially to the nonlinearities. Engi- 
neering texts on servotheory (Bower 
& Schultheiss, 1958; Brown & Camp- 
bell, 1948; Goode & Machol, 1957) 
distinguish between analysis or the 
description of a system of existence, 
and synthesis or the prediction of the 
characteristics of components of the 
system to achieve certain objectives. 
Conceivably, we might eventually 
describe a man-machine tracking 
system already in existence because 
the response characteristics of the 
human operator can be empirically 
determined for the range of inputs of 
interest and the operator states that 
prevail. However, synthesizing is 
quite different because it requires 
that we know the laws of human be- 
havior as a function of task and pro- 
cedural variables and are able to 
predict the characteristics of the hu- 
man response functions. Questions 
relating to such operator states as 
learning and fatigue most certainly 
will arise and it is evident that these 
queries will not be answerable if task 
variables are taken as the primary 
research domain of engineering psy- 
chology. Engineering psychology, it 
would seem, cannot escape the bur- 
den of the same variables and searches 
for lawfulness which traditionally 
occupy all psychologists. 

In defense of the servotheory ap- 
proach to tracking, its protagonists 
have been engaged in proper search 
for a descriptive mathematical device 
for man-machine tracking systems 
which includes provisions for task 
variables and the properties of re- 
sponse outputs to inputs which are 
co tinuous with respect to time. Con- 
temporary behavior theories ordi- 
narily employ measures of behavior, 
such as frequency and latency, which 
can be defended as operationally 
meaningful dependent variables but 
which are gross summary indices of 
complex behavior sequences and often 
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do violence to the subtleties of the 
ongoing behavior. Commonly, psy- 
chologists in their laboratory research 
will elicit elaborate time-based re- 
sponse sequences from an organism 
and then will ignore completely the 
time-varying characteristics of the 
responding in their measurement. In 
contrast, psychologists studying track- 
ing have recognized, almost from the 
beginning, that their dependent meas- 
ures should somehow describe the 
prominent characteristics of time- 
based response functions. And, be- 
cause contemporary behavior theories 
give no attention to time functions, 
tracking psychologists appear to have 
suffered disenchantment and have 
turned to the mathematical schema 
of closed-loop servotheory, inade- 
quate though it is, because it grap- 
ples directly with the measurement 
and description of time-varying quan- 
tities. The fact that servotheory is of 
little value for quantitative descrip- 
tion of man-machine tracking sys- 
tems should not allow us to forget 
that the interest in it has reflected a 
legitimate concern about measure- 
ment issues and variables which are 
important for the response phenom- 
ena under investigation. 


TRADITION OF GENERAL EXPERI- 
MENTAL PSYCHOLOGY 


Basic research on tracking by gen- 


eral experimental psychologists has 
not had the strong emphasis of task 
Frequently, in basic re- 


variables. 
search, the experimental task 
been a convenient means of eliciting 
a response class for the purposes of 
behavioral 
theoretical 


has 


manifesting underlying 
processes which are of 
rather than practical interest, and 
consequently tracking tasks have not 
been studied for their own sake. Ex- 
amples of this approach are many of 
the tracking studies on the Rotary 
Pursuit Test with interest in fatigue- 
like effects or, more exactly, the im- 
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plications of Hull’s (1943) expres- 
sions of reactive and conditioned 
inhibition for behavior (Adams, 1956; 
Adams & Reynolds, 1954; Kimble & 
Horenstein, 1948). Other studies of 
fatigue processes (Floyd & Welford, 
1953; Payne & Hauty, 1954; Siddall 
& Anderson, 1955) using tracking 
tasks have had a similar general con- 
cern and have shown little interest in 
the study of tracking for its own sake. 
The interest in task variables per se 
which has preoccupied engineering 
psychology has been largely absent in 
the research of general experimental 
psychology. This has been a healthy 
countertrend to the task emphasis of 
engineering psychology but the ap- 
proach of using virtually any con- 
venient task to elicit a response class 
can be considered a deficiency be- 
cause it shows a lack of appreciation 
for the influence of task variables on 
behavior, and the possible interac- 
tions that can be expected to occur 
between task and procedural vari- 
ables. These studies seem to have 
implicitly assumed that behavioral 
laws will transcend particular char- 
acteristics of a task, but this is an 
unlikely possibility because of the 
extensive work in engineering psy- 
chology showing the potent influence 
of task variables on performance. 
There is good reason to expect that 
many task variables will interact with 
those variables which have been of 
interest in testing theoretical deduc- 
tions. To illustrate, if it were even- 
tually found that a major cause for 
the depressant effects of massed 
practice on the tracking response was 
that work inhibition degraded the 
quality of proprioceptive feedback, 
the behavior functions would, as a 
minimum, have to be expressed in 
relation to the interaction of inter- 
trial interval and those control sys- 
tem variables which determine pro- 
prioceptive feedback. Helson (1949), 
in a report of the Foxboro investiga- 
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tions which were an early series of 
systematic tracking studies, points 
out that both task and procedural 
variables are pertinent to a complete 
understanding of human behavior. 
Lewis (1953) has urged closer atten- 
tion to the relations between the 
physical organization of tasks and the 
complexities of behavior. 

An important line of tracking re- 
search, which can be subsumed under 
the rubric of general experimental 
psychology, has been dominated by 
British investigators of the Applied 
Psychology Research Unit, Cam- 
bridge University, and mainly con- 
cerns efforts to delineate the intrinsic 
characteristics of the overt motor 
tracking response, and to identify 
and assess the response classes which 
intervene between the displayed stim- 
uli and the measured motor response. 
Examples of these interests are the 
question of whether the apparently 
smooth, continuous tracking response 
is fundamentally intermittent (Cher- 
nikoff & Taylor, 1952; Craik, 1947, 
1948; Davis, 1956; Elithorn & Law- 
rence, 1955; Hick, 1948; Poulton, 
1950; Searle & Taylor, 1948; Taylor 
& Birmingham, 1948; Vince, 1948a, 
1948b, 1949; Welford, 1952) and the 
conditions under which the human 
operator learns to predict or antici- 
pate changes in the input signal 
(Bartlett, 1951; Craik, 1947, 1948; 
Leonard, 1953; Poulton, 1952a, 1957a, 
1957b, 1957c; Vince, 1953, 1955). 
These studies have manipulated both 
task and procedural variables and 
have, in many respects, been the most 
influential of all in improving our 
scientific understanding of tracking 
behavior because they have attempt- 
ed, in a detailed and analytical fash- 
ion, to clarify the various response 
facets of tracking behavior and the 
variables determining them. It is 
perhaps safe to say that these studies 
have stood as a numerical minority 
in tracking research, and this is un- 


fortunate because such information 
stands as the foundation of any sys- 
tematic empirical and_ theoretical 
organizations of tracking behavior. 
Neither the studies of tracking qua 
tracking which have arisen out of the 
applied interests of engineering psy- 
chology, nor the studies of theoretical 
psychology where tracking tasks have 
been used as a matter of convenience, 
can progress very far until their find- 
ings are related to the complex char- 
acteristics of tracking behavior. Ana- 
lytical tracking studies in this vein 
will be discussed in some detail in 
later sections of this paper. 
AREAS OF NEGLECT 

With some exceptions, engineering 
psychology and general experimental 
psychology have tended to gloss over 
three fundamental topics which must 
be given more attention if we are 
eventually to have the beginning of a 
theory of tracking behavior: 

1. Tracking tasks have never been 
defined other than by convention. 
Early interests in tracking behavior 
arose out of applied situations where 
a continuously generated error quan- 
tity had to be nulled by continuous 
operator movements. Laboratory 
studies of tracking follow this applied 
tradition of a continuous task, al- 
though on occasion discrete displace- 
ments of the input signal have been 
used (Craig, 1949; Ellson, Hill, & 
Craig, 1949; Rund, Birmingham, 
Tipton, & Garvey, 1957; Searle & 
Taylor, 1948; Taylor & Birmingham, 
1948; Vince, 1948b, 1949). An at- 
tempt must be made, at least in a 
preliminary way, to define the allow- 
able variations in input, both in type 
and functional form, as well as the 
characteristics of the control system 
used for responding. 

2. Not enough attention has been 
given to the emphasis (largely British) 
on a more detailed description of 
behavior in tracking. Recognition 
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must be given to the presence and 
interaction of several overt and inter- 
vening response and stimulus classes, 
and how these factors act to deter- 
mine the characteristics of the meas- 
ured motor response. 

3. Relatively little interest has 
been expressed in multidimensional 
tracking tasks having two or more 
stimulus sources in the same or differ- 
ent sense modalities, and correspond- 
ing dimensions in the control system 
for response to each source. Most 
tracking research has been performed 
on one-dimensional tasks. The im- 
plications of various ways of organiz- 
ing multiple inputs and the control 
systems for response to them need 
more formalization and research. 

This paper will, in turn, discuss 
issues, problems, and research asso- 
ciated with each of these three areas. 


Definition of Tracking 


A one-dimensional tracking task 
will be defined by the following con- 
ditions: 

1. An externally driven input sig- 
nal defines an index of desired per- 
formance and the operator actuates 
the control system to maintain align- 
ment of the output signal of the con- 
trol system with the input signal. 
The discrepancy between the two 
signals is the error and the operator 
responds to null the error. Two basic 
types of tracking tasks are differenti- 
ated by how this error quantity is 
represented: (a) Pursuit Tracking. 
The disp.ay has two indicants. One 
is actuated by the input signal and 
the other is linked to the output 
signal of the control system. The two 
indicants are presented directly to 
the operator and he responds to null 
the error difference between them. 
(5) Compensatory Tracking. The 
error to be nulled is not the difference 
between two directly observed indi- 
cants primarily linked to the input 
and output signal as in pursuit track- 
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ing. Instead, the error observed in 
pursuit tracking is abstracted and 
used to actuate a single indicant in 
relation to a fixed reference. The 
operator’s task, just as in pursuit 
tracking, is to null this error. The 
principal difference between pursuit 
and compensatory tracking is that 
with the latter the operator never 
observes the uncontaminated action 
of the input or output signal directly 
—only the error difference between 
them. 

2. The input signal is time-based 
and independent of the operator's 
response, i.e., the task is paced. A 
paced task is distinguished from 
a self-paced task where stimulus 
changes are a function of operator 
responding (Adams, 1954). 

3. The control system has con- 
straints that enforce certain transi- 
tional courses of action on the human 
operator. Instead of being able to 
move the control from a given posi- 
tion to any other position, the opera- 
tor must move through defined inter- 
vening states of the control system. 
For example, consider a one-dimen- 
sional visual tracking task using a 
pivoted control lever with hypotheti- 
cal control Positions A, B, C, and D. 
If the operator is at Position B at 
time ¢, he has a three-choice decision 
for moving the control at time ‘+1, 
each with a probability of being cor- 
rect: he can repeat the response of 
time ¢ and leave the control at Posi- 
tion B, or he can move the control to 
either Position A or C. At the two 
extreme limiting positions of the con- 
trol, only two choices are involved: 
leave the control where it is, or move 
it to the position adjoining the limit- 
ing one. By this definition, any task 
where the operator has free transi- 
tional access to all of the control sys- 
tem states is prohibited from being a 
tracking task. 

4. The states of the input signal 
have the same transitional constraints 
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as the control system. The input 
signal, in changing from time ¢ to 
t+1, must change according to con- 
straints defined by the control sys- 
tem. By imposing the same con- 
straints on the input signal and the 
control system, the tracking task is 
given a degree of feasibility for the 
human operator and means that the 
input cannot take any action which, 
in principle, cannot be met by action 
of the control system. This does not 
mean that a tracking task must allow 
near perfect performance by the 
operator. The input function may be 
a high frequency sine wave to which 
the operator can never achieve a high 
level of proficiency, but this is a be- 
havioral matter and not a function of 
inherent design features of the task. 
Table 1 presents the permissible 
transitional states for the hypotheti- 
cal four-state tracking task discussed 
above. 

This definition is general and does 
not specify the characteristics of the 
input signal or the control system, 
other than indicating certain transi- 
tional restraints for both. The input 
states and the responses to them can 
be discrete or continuous, and the 
input can have any degree of regu- 
larity from nearly random (true 
randomness is denied by conditional 
restraints of the type shown in Table 
1) to completely repetitive. The use 
of discrete states of the input signal 
deserves more than the passing at- 
tention it has been given in the past 
because they are particularly ame- 
nable to statistical structuring in 
terms of first and higher order prob- 
abilities (with the restraints noted). 
Another advantage of discrete inputs 
is that their duration is easily manip- 
ulable, making the number of events 
per unit of time an important dimen- 
sion for investigation. This time 
variable has been termed the “speed 
or pacing factor” (Adams, 1954; 
Conrad, 1951, 1954; Wagner, Fitts, 


TABLE 1 


Matrix GOVERNING THE ALLOWED TRAN- 
SITIONAL STATES FOR THE INPUT SIG- 
NAL AND THE CONTROL SYSTEM 
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Note—The matrix represents a hypothetical four- 
state one-dimensional tracking task. Cells marked with 
“Yes” indicate permissible transitions from the #th 
state at time é to the jth state at time ¢+1. “No” entries 
are absolute constraints and signify the denial of transi- 
tion toa jth state from a prior sth state. 


& Noble, 1954) and is analogous to 
number of cycles per second when a 
continuous input is used. One prom- 
ising measure expressing the statisti- 
cal coherency of a discrete input 
signal and the duration of its events 
is the informational measure of bits 
per unit of time (Shannon & Weaver, 
1949). The rate of change, as well as 
higher derivatives, can also be a 
variable for discrete input events but 
no attempts have ever been made to 
explore these more complex dimen- 
sions. 


The Complexity of Behavior in One- 
Dimensional Visual Tracking 


The purpose of this section is to 
discuss some of the characteristics 
of the response classes which can be 
identified in one-dimensional visual 
tracking, as well as the issues sur- 
rounding them. Visual tracking will 
be analyzed because almost all track- 
ing research has used the visual mo- 
dality. However, in whatever broad 
empirical and theoretical conceptual- 
izations of tracking behavior that 
might eventually mature, it will be 
necessary to structure the character- 
istics of tracking in other sense mo- 
dalities to. But since other modalities 
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such as audition have received only 
exploratory attention (Humphrey & 
Thompson, 1952a, 1952b, 1953), it 
seems unduly speculative at this 
time to include them. 

Rather than the servotheory ap- 
proach which has been the frame of 
reference of some investigators, an 
attempt will be made to demonstrate, 
on the basis of the available experi- 
mental evidence, that tracking be- 
havior involves a linked chain of 
overt and internal stimuli and re- 
sponses and is much more complex 
than implied by the prominent error- 
nulling characteristics of the servo- 
anaiogy. ‘While the servoanalogy is 
adequate enough for its schematic 
purposes, the behavioral phenomeria 
cannot be viewed so simply. There 
are three major areas for discussion: 
the observing response which orients 
receptors to sense stimulus events.on 
the display, the prediction responses 
where the operator learns to antici- 
pate future characteristics of the in- 
put signal, and the hypothesis that 
the measured motor response, even 
in continuous tracking, is intermit- 
tent and not smooth graded move- 
ments that might appear to a casual 
observer. 

Most of the phenomena will be dis- 
cussed in greatest detail under the 
heading of pursuit tracking, and the 
presence of the same or similar phe- 
compensatory tracking 
will, in most cases, be obvious. Be- 
havioral considerations which are 
uniquely characteristic of compensa- 
tory tracking will be treated sepa- 
rately. 


nomena in 


Pursuit TRACKING 
Observing Response 
The sensing of the displayed indi- 
cants driven by the input and output 
signals, as well as the error difference 
between them, is by the observing 
response. These three environmental 


quantities each play an important 
role in pursuit tracking and their 
moment-to-moment state is sampled 
as the observing response orients the 
receptors to them. The input indi- 
cant is the desired state, the error 
difference between the input and out- 
put signal represents how well the 
desired state is achieved, and the 
output indicant gives knowledge of 
results on how specific sequences of 
motor movements are represented on 
the display. Some general attention 
has been given to the general role of 
the observing response (Wyckoff, 
1952), but within the context of 
tracking it is considered as having 
two functions: head and/or eye 
movements to direct the visual re- 
ceptors to spatially separated stim- 
uli, and the discrimination of stim- 
ulus change. The head and/or eye 
movements can be considered overt 
aspects of the observing response and 
potentially measurable (Mackworth 
& Mackworth, 1958). However, the 
discrimination function of the 
serving response is an inferred phe- 
nomena, with its locus unspecified. 

Common experience dictates the 
necessity for an observing response 
but there is also experimental evi- 
dence which documents its impor- 
tance. Adams (1955), using the 
Rotary Pursuit Test, found that 
operations of repeatedly activating 
the visual observing response inde- 
pendently of the arm-hand goal re- 
sponse, and which presumably served 
to fatigue the observing response, 
resulted in a goal response decrement 
and permitted the inference that the 
performance level of the goal re- 
sponse is partly determined by the 
strength of the intervening observing 
response. Another relevant line of 
evidence isa study by Poulton (1952b) 
where it was found that pursuit 
tracking performance deteriorated 
when the two pointers on the display 


cb- 
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were increased in their spatial sepa- 
ration. One interpretation of this 
finding is that the greater spatial 
separation required more extensive 
orienting of the observing response 
with the result that less time, on the 
average, was devoted to each pointer. 
Viewing the observing response as the 
mechanism by which stimuli are 
sampled, the wider the spatial sepa- 
ration the less frequently each source 
of environmental stimuli is sampled 
and the less likely that an appropriate 
response will be made. Bearing on 
this sampling function of the observ- 
ing response is a vigilance experiment 
by Jerison and Wallis (1957) where 
it was found that the scanning of 
three stimulus sources resulted in a 
lower rate of detecting aperiodic 
stimulus change than when only one 
source had to be watched. 


Prediction Responses 


The input signal in pursuit track- 


ing actuates an indicant which is 
directly observed by the operator. 
To the extent that the operator can 
predict the regularities inherent in 
this input signal he will be able to 
anticipate the correct response move- 
ment and initiate at a time to mini- 
mize error. In the absence of a pre- 
dictive capability the operator must 
wait for the change in the input signal 
to actually occur on the display, with 
the result that his response will gen- 
erate tracking error as a function of a 
delay of at least one reaction time 
interval. 

Helson (1949) and his associates, in 
their Foxboro studies of tracking 
during World War II, were perhaps 
the first to suggest that prediction 
behavior is manifest in reaction time 
values far less than those obtained in 
classical reaction time experiments. 
Bartlett (1951) has written an ex- 
cellent paper on the role of anticipa- 
tory behavior which seems to be little 


known and referenced in the United 
States. The most extensive research 
on the prediction of directional course 
changes in the input signal has been 
by Poulton (1952a, 1952b, 1957a, 
1957b, 1957c), and he distinguishes 
between two general classes of pre- 
diction: (a) receptor anticipation, 
which is analogous to the foreperiod 
of the classical simple reaction time 
experiment where a preparatory signal 
is presented to the operator in ad- 
vance and establishes a “‘set”’ for re- 
sponse, and (6) perceptual anticipa- 
tion, where no advance information 
is intentionally given each time but 
the operator nevertheless is able to 
predict the course of future signals on 
the basis of his past experience. It is 
this latter type of anticipation which 
is of greatest interest in tracking in 
that any knowledge of a future state 
of the input signal must be an ac- 
quired or learned prediction; the 
definition of a tracking task does not 
provide for foreknowledge of a state 
of the input signal. In one study 
(1952b) Poulton evaluated anticipa- 
tion in pursuit tracking as a function 
of practice and two levels of input 
complexity—a simple harmonic mo- 
tion and a complex harmonic course. 
Taking an anticipation of change in 
the input signal as a response of dura- 
tion less than the expected reaction 
time of about .20 seconds, Poulton 
found that the subjects were predict- 
ing the simple harmonic course both 
early and late in practice, and that 
the success of prediction was a posi- 
tive function of practice. Although 
overall tracking error decreased with 
practice on the complex input course, 
there was no evidence for improve- 
ment in anticipation and Poulton 
concluded that the improvement was 
largely attributable to increased man- 
ual dexterity. In this study, Poulton 
also investigated the smoothness of 
tracking, defined by the number of 
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unnecessary discrete changes of speed 
that were made. The fewer the num- 
ber of such changes, the better the per- 
formance. With the simple harmonic 
course, it was found that smooth- 
ness of response increased with prac- 
tice but no such changes were found 
for the complex input. Poulton 
viewed his measure of smoothness as 
an additional index of anticipation 
because, when the operator was not 
anticipating, he would tend to wan- 
der off course and his tracking record 
would show a greater number of 
corrective movements. He observes 
that smoothness is a less sensitive 
measure of beneficial anticipation 
than response time because the op- 
erator may be tracking with a large lag 
but nevertheless tracking smoothly. 
Yet, the fact that the subjects tracked 
most smoothly for the harmonic in- 
put course which also produced the 
greatest degree of anticipation sug- 


gested that the covariation of these 
two measures reflects the same under- 


lying ability to predict stimulus 
change in direction. 

Another study by Poulton (1952a) 
used the same pursuit tracking ap- 
paratus as in his previous study 
(1952b) and investigated the accu- 
racy with which an operator could 
predict the position of the input indi- 
cant for various amounts of time in 
the future. At the sound of a hammer 
blow the operator had to move the 
output indicant to the position an- 
ticipated for the input indicant when 
a bell sounded .50, 1.5, or 3.5 seconds 
later. This procedure was regularly 
repeated and resulted in a series of 
discrete responses predicting the posi- 
tion of the input indicant. The accu- 
racy of prediction was better than 
chance for both simple harmonic and 
complex harmonic inputs, with the 
accuracy being greater for simple 
harmonic motion. On the basis of 
these experiments, Poulton concluded 
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that course anticipation is an im- 
portant determiner of the overall 
proficiency level in pursuit tracking. 
He hypothesized that higher input 
speeds place a greater premium on 
prediction because, as the speed of 
the input signal increases, the failure 
to anticipate means that a greater 
segment of the input course span will 
pass during the subject’s reaction 
time period if he waits for stimulus 
change to actually occur before re- 
sponding and a larger error will de- 
velop. An excellent review of the role 
of prediction in tracking and other 
types of visual-motor tasks has been 
published by Poulton (1957b). 

A series of investigations by Gotts- 
danker (1952a, 1952b, 1955, 1956) is 
closely related to those of Poulton. 
Gottsdanker’s studies were concerned 
with the prediction of velocities and 
accelerations of input rather than 
directional changes in the course, and 
were subsumed under the label pre- 
diction motion. The experimental ap- 
proach required the subject to track a 
continuous input viewed through a 
narrow slit. The input was printed on 
paper in the form of parallel lines 5 
millimeters apart, and the subject 
responded by trying to keep a pencil 
point between the two lines. He was 
told that when the input disappeared 
he was to project its path into the 
future as if he were attempting to 
follow an airplane that had gone 
behind a cloud. Some of the input 
paths had constant velocities but 
others had motions that were posi- 
tively or negatively accelerated. In 
general, his findings show that con- 
stant velocities are accurately pre- 
dicted, but that the prediction of 
accelerations tended to be of a con- 
stant velocity rather than the re- 
quired increase or decrease in velocity. 
Gottsdanker interpreted this to mean 
that the subject responds on the 
basis of averages or integrations of 
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preceding velocities. Two studies by 
Vince (1953, 1955) used a technique 
very similar to Gottsdanker’s in in- 
vestigations of what she termed 
“intellectual processes’ in skilled 
performance. Another paper of 
interest on this topic, but not directly 
related to tracking, is by Leonard 
(1953). 

The studies by Poulton and by 
Gottsdanker have involved the learn- 
ing of prediction during the course of 
actual practice on a tracking task. A 
related line of investigation, which 
has been given some attention in 
another study by Poulton (1957a), 
is the effect of training to predict the 
stimulus source prior to actual motor 
practice in the total tracking task. 
This can be viewed as a part-whole 
transfer of training approach, where 
prediction responses are considered 
part of the response totality in track- 
ing. Granting this, prediction re- 
sponses should be trainable prior to 
whole-task practice and, in being a 
part of the total response complex, 
should have their strength reflected 
in the dependent motor response 
whose proficiency reflects the strength 
ot all the response classes in the com- 
plex. This approach is quite similar 
to verbal pretraining methods where 
the operator is required to learn 
verbal responses to task stimuli prior 
to motor responses in the whole task 
(Arnoult, 1957; Goss, 1955). Al- 
though verbal pretraining studies 
have not dealt specifically with the 
problem of prediction responses, they 
are concerned with learned mediating 
responses where response produced 
stimuli are hypothesized to provide 
additional discriminative cues for 
the motor response (Goss, 1955; 
Osgood, 1953). Conceptually there- 
fore, they appear quite similar to 
prediction responses and one might 
hypothesize that an adaptation of 
these same methods can be used for 


prior training in the prediction of 
input events in a tracking task. How- 
ever, with our impoverished knowl- 
edge of the underlying nature of antic- 
ipatory mechanisms, it is plausible 
that prediction has nothing to do 
with mediating responses but, in- 
deed, may be fundamentally a pro- 
prioceptive-oriented phenomenon. Giv- 
ing proprioception a role in anticipatory 
behavior needs only the reasonable 
assumption that motor movements 
are conditioned to traces of proprio- 
ceptive stimuli and that, with prac- 
tice, the occurrence of a proper con- 
figuration of proprioceptive stimuli 
will tend to elicit the next correct 
motor sequence. Certainly this is not 
to deny intellective processes or me- 
diating responses as variables in pre- 
diction, but it does suggest that there 
might be at least two facets that de- 
serve experimental inquiry. ‘‘Predic- 
tion response’’ is a commonly used 
label for anticipation in this paper 
but eventually it may prove to be a 
poor term if proprioception proves to 
be a paramount influence. The verbal 
pretraining studies throw the balance 
of the explanatory weight at present 
in the direction of mediating re- 
sponses as the basis for anticipation, 
but definitive research on this topic 
remains to be done. 


Characteristics of the Measured 
Motor Response 


The basic nature of the motor 
movement activating the control 
system in a tracking task has been 
the subject of extensive discussion 
and controversy. The issue is whether 
the motor response is a continuous 
function of time or whether it is dis- 
continuous and intermittent. The 
intermittency hypothesis stems from 
arguments that a responding effector 
has a period of refractoriness or re- 
duced excitability before it can be 
made to respond in full strength 
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again. Because this evidence stems 
from molar behavior data, it is called 
psychological refractory phase to 
distinguish it from the physiological 
refractory phase of individual nerve 
fibers. The sizilarity of psychologi- 
cal refractory phase and physiological 
refractory phase is in terms of re- 
duced responsiveness following stim- 
ulation and response, but the levels 
of analysis of the two classes of phe- 
nomena are so different that it is per- 
haps safest to view them as analogous 
rather than stemming from a com- 
mon underlying process. 

Probably the first statement of 
psychological refractory phase was 
by Telford (1931) who found that 
reaction time to the second of a pair 
of auditory stimuli was lengthened 
if the time spacing of the two stimuli 
was reduced to .50 seconds, and he 
concluded that the subject becomes 
refractory in a manner comparable 
to the refractoriness of isolated nerves. 
Using Telford’s study as a point of 
departure, Vince (1948a, 1948b) asked 
whether refractoriness is present in 
continuous tracking to give the motor 
response an intermittent, impulsive 
quality. She concluded that inter- 
mittent corrections every .50 seconds 
is a basic feature of human tracking 
responses in a manner quite compa- 
rable to Telford’s finding for discrete 
stimuli and a reaction time response. 
If her interpretation is correct, the 
notion of psychological refractory 
phase becomes an important general 
principle. But in criticism of Vince's 
findings, psychological refractory phase 
refers to the periodicity of motor 
movements and not tracking error. 
Her conclusions were based on track- 
ing error records and periodicities in 
them are not a function of motor 
movements alone but of the difference 
between the output signal generated 
by the motor movements and the in- 
put signal. Periodicities in the error 
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function may be correlated with 
periodicities in motor movements 
but they are contaminated by the 
influence of the input signal and are 
not an unequivocal index on which 
to base conclusions about psycho- 
logical refractory phase as a mecha- 
nism for inducing intermittent motor 
corrections, 

With the exception of the foregoing 
studies by Vince, research on motor 
intermittence has been with discrete 
tasks, although many of the investi- 
gators have freely implied the gen- 
erality of the phenomenon to include 
continuous tracking. Mainly, these 
studies conclude that reaction time 
to a second stimulus of a pair will be 
lengthened if the interstimulus inter- 
val is less than .50 seconds. A limit 
to this generalization is that very 
brief interstimulus intervals cause 
the stimuli to be perceived as a single 
entity, with the result that only a 
single response occurs. Vince (1948b, 
1949) and Hick (1948) have both 
used discrete tracking tasks and have 
provided additional corroborative evi- 
dence on psychological refractory 
phase. Craik (1947, 1948) and Wel- 
ford (1952) use these data for theo- 
retical discussions on the generality 
of psychological refractory phase as a 
determinant of intermittency in re- 
sponding. Poulton (1950) criticized 
the tendency to regard the refractory 
interval of .50 seconds as a human 
constant because the quasirandom 
presentation of stimuli did not allow 
the operator to form a proper pre- 
paratory set. When allowance is made 
for the acquisition of a preparatory 
set by having predictable stimuli, 
Poulton found that the refractory 
phase interval reduces to .20—.40 
seconds. Davis (1956) and Elithorn 
and Lawrence (1955) also discuss 
the role of anticipatory set and the 
psychological refractory period. A 
general discussion of research and 





HUMAN TRACKING BEHAVIOR 69 


views on this topic is presented by 
Fitts (1951). 

Another aspect to the intermit- 
tency hypothesis is that the duration 
of patterned movements to discrete 
stimuli can be less than visual-motor 
reaction time. This has implied re- 
sponse discontinuity to some in- 
vestigators because the subject is 
executing response sequences mo- 
mentarily independent of the magni- 
tude of the visually perceived error. 
Because the response is not con- 
tinuously guided by the primary 
visual tracking error quantity, it is, 
for a time, open-loop or intermittent 
(Searle & Taylor, 1948; Taylor & 
Birmingham, 1948). These authors 
conclude that response movements 
are under a kind of a “cam control” 
where the visually perceived error 
triggers a cammed sequence. On the 
basis of past experience the ‘“‘cam”’ 
runs off the continuously varying 
force pattern, including starting and 


stopping, and all without visual or 
proprioceptive feedback. 
Admitting the possibility that the 


continuous control of movements 
during an interval less than that re- 
quired for visual-motor reaction time 
can be proprioceptive feedback, 
Chernikoff and Taylor (1952) con- 
ducted a study to see if kinesthetic 
reaction time was sufficient to ac- 
count for the control of the response. 
They concluded that continuous 
tracking behavior is best described 
by the intermittency hypothesis, 
analogous to cam control where very 
brief movement sequences are run 
off in the absence of visual and pro- 
prioceptive guidance. Lashley (1951) 
in a parallel line of argument, is in 
agreement that kinesthetic reaction 
time cannot explain many facts of 
motor responding such as the finger 
movements of a skilled pianist mov- 
ing at about 16 per second. These 
rates are too fast to allow kinesthetic 


feedback after each one, and Lashley 
postulates that some central sensory 
control is operating, presumably in a 
fashion similar to the cam hypothesis 
stated by Taylor and his associates. 
Craik (1947) holds a similar view. 
Arguing from piano playing to track- 
ing is tenuous however, if for no 
other reason than that a musical 
composition provides foreknowledge 
of a requirement for movement se- 
quences, and reaction time to each 
one is known to be greatly shortened 
under these special conditions (Vince, 
1949). Advance notice of stimuli is 
not a characteristic of tracking tasks. 
Moreover, Poulton’s work (e.g., 
1952b) has shown that learning to 
anticipate stimulus sequences is re- 
vealed in greatly shortened reaction 
time values. It is hardly surprising 
that a trained musician can some- 
times sidestep the restraints of an 
elementary afferent-efferent loop and 
receive guidance from learned, inter- 
nal sources. 

Gibbs (1954a, 1954b) in two im- 
portant papers effectively argues 
against the hypothesis that con- 
tinuous motor movements do not 
have continuous kinesthetic feedback 
guiding them. He points out that 
arguments based on kinesthetic reac- 
tion time fail to distinguish between 
the connecting and conducting func- 
tions of the central nervous system. 
Gibbs observes that kinesthetic reac- 
tion time to discrete stimuli can be 
considered the connecting time be- 
tween kinesthetic stimulation and 
overt motor response, and this has 
little bearing on continuous kines- 
thetic or neural conduction during 
voluntary movement. Gibbs bases 
his discussion on physiological data 
by Matthews (1933) which showed 
that a muscle had ‘“‘tension”’ afferents 
and “‘stretch’’ afferents which, re- 
spectively, provide sensing of static 
position and of movement of a limb. 
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Tension afferents respond primarily 
when the muscle is at rest and has an 
electrical discharge approximately 
proportional to the logarithm of the 
tension. Stretch afferents, on the 
other hand, respond when the muscle 
is stretched in movement and has a 
rate of electrical discharge propor- 


tional to the rate of stretch, and 


Gibbs holds that this is the source of 
continuous kinesthetic feedback mon- 
The subject must “know’”’ 


itoring. 
limb position in guiding 
ments and Gibbs hol@s that this is ob- 
tained by integrating the rate tunc- 
tion. The notion of a fnite integra- 
tion period might sugzes' that Gibbs’ 
hypothesis is essentiall, the same as 
the intermittency hypothesis be- 
cause successive integrations might 
be revealed as intermittent move- 
ments of .50 seconds as limb position 
is successively “computed.” Ac- 
tually, the implications are quite 
different because Gibbs’ hypothesis 
would seem to hold that there are 
conditions where an integration in- 
terval of .50 seconds would apply 
but that integration intervals of 
longer duration are equally possible. 
Gibbs’ physiological hypothesis 
would seem to allow for perfectly 
smooth tracking movements of rela- 
tively long duration and, indeed, this 
is a common observation in tracking 
records. Oddly enough, relatively 
long periods of smooth responding in 
continuous tracking have not served 
as grounds for seriously challenging 
the intermittency hypothesis. Craik 
(1948) and Noble et al., (1955) re- 
mark on these smooth responses and 
offer the ad hoc explanation that 
intermittent movements are occur- 
ring in accordance with the principle 
of psychological refractory phase but 
that the subject’s acquired capability 
to predict input sequences has over- 
laid a smoothing effect. While pre- 
diction responses may well have some 


his move- 
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sort of smoothing influence, it also 
may be true that the intermittence 
hypothesis is false for continuous 
tracking and that relatively long, 
smooth responses frequently occur in 
the absence of prediction behavior. 
Gibbs’ work emphasizes the rather 
simple fact that the intermittency 
hypothesis has its validity derived 
from research on discrete tasks and 
its generalization to continuous 
tracking may be inappropriate. 
Gibbs’ use of Matthews’ findings 
raises the interesting idea that pro- 
ficiency in making accurate acclera- 
tions in tracking is related to the 
subject’s ability to discriminate 
changes in the rate of kinesthetic 
impulses. One interpretation of 
Gottsdanker’s findings (1952a, 1952b, 
1955, 1956) that the subject poorly 
predicts velocity changes is that he 
cannot kinesthetically discriminate 
with enough accuracy those velocity 
changes which he visually perceives. 
However, this interpretation must 
be approached cautiously because it 
fails to consider that the inability to 
discriminate velocity changes con- 
ceivably could be on the visual-per- 
ceptual side rather than the kines- 
thetic. To interpret Gottsdanker’s 
data properly we must, by indepen- 
dent operations, determine the rela- 
tive capabilities of perceptual and 
kinesthetic discrimination of accel- 
eration. If the operator cannot per- 
ceptually discriminate the velocity 
changes involved, then the motor 
response system is not receiving ade- 
quate information and the overt re- 
sponse cannot be expected to reflect 
information that has not been re- 
ceived. Or, conversely, the operator 
may be perfectly able to discriminate 
the velocity change perceptually but 
he may be unable to translate it into 
the proper accelerated movement be- 
cause he cannot make sufficiently ac- 
curate kiresthetic discriminations. 
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Some work on the perceptual dis- 
crimination of instantaneous changes 
in velocity has been done by Hick 
(1950) and Brandalise and Gotts- 
danker (1959). Too, this general line 
of reasoning suggests the hypothesis 
that the relative effectiveness of posi- 
tion, rate, and acceleration tracking 
may be related to the compatibility 
of perceptual and kinesthetic events. 


COMPENSATORY TRACKING 


The discussion of research and 
problems under the heading of pur- 
suit tracking applies also to com- 
pensatory tracking. Whatever dif- 
ferences exist are resident in the 
different ways in which the two types 
of tracking tasks have their data 
organized on the display. The pre- 
sentation of only the error quantity 
in compensatory tracking means that 
performance usually will be poorer 
for two reasons: 

1. The operator cannot see the in- 


put signal directly which means that 
he is handicapped in the acquisition 
of prediction responses. 

2. The operator cannot see the 
output signal directly so he is handi- 
capped in receiving knowledge of re- 


sults. In addition to influencing the 
acquisition of simple visual-motor 
learning where prediction behavior is 
absent, this factor also influences the 
acquisition of prediction responses 
because the operator cannot un- 
equivocally verify the results of any 
particular prediction response. 
Depending upon task circum- 
stances, some prediction behavior 
can be expected to form under com- 
pensatory tracking conditions. The 
error signal is a function of both the 
input and the output signals, and at 
times the reguiarities in the input 
will be discernible. Poulton (1952b) 
has shown that prediction behavior 
does occur with practice in compensa- 
tory tracking but that prediction is 


impressively superior in pursuit track- 
ing. Undoubtedly this is one of the 
factors which almost always renders 
pursuit tracking superior to com- 
pensatory tracking (Hartman & 
Fitts, 1955; Poulton, 1952b). 

Nor can we assume that the ab- 
sence of a direct presentation of the 
output signal means that knowledge 
of results is completely absent. There 
is evidence from a study by Cherni- 
koff and Taylor (1957) that when the 
input signal of a continuous tracking 
task is a low frequency input the sub- 
ject receives fairly adequate knowl- 
edge of results, probably because the 
motor movements produce output 
frequencies which are higher than the 
input frequency changes. This is de- 
duced from the slightly better per- 
formance that was found in this study 
for compensatory over pursuit track- 
ing when the input was a low fre- 
quency signal. At higher frequencies, 
they found that pursuit tracking 
maintained its well-known advantage 
over compensatory tracking. 


Two-DIMENSIONAL TRACKING 


A two-dimensional tracking task 
has two stimulus sources command- 
ing response, with each source hav- 
ing its own separate input signal and 
a dimension of the control system for 
response. An example of a two-di- 
mensional visual tracking task would 
be two voltmeter stimulus sources 
with a left-hand control lever for re- 
sponse to one and a right-hand lever 
for response te the other. Or, the 
two stimulus sources could have a 
bisensory distribution, with one vis- 
ual and one auditory. Our ignorance 
of variables involved in the various 
ways to organize a two-dimensional 
tracking task dictates that only a 
limited examination of some of the 
issues be made. The discussion will 
be restricted to two cases: spatially 
separated visual sources, and bisen- 





72 JACK A. 


sory sources where one is auditory 
and the other is visual. Nothing of 
importance is known of the effects of 
control system design as it bears on 
the distribution of the two response 
dimensions among the possible effec- 
tor systems, so it will not be dis- 
cussed. Nor will the relative ad- 
vantages of pursuit and compensa- 
tory displays be discussed for what- 
ever special implications might be 
found for two-dimensional tasks. 
Because almost all of the research 
in tracking has employed one-dimen- 
sional visual tasks, it is unfortunately 
necessary to attempt this preliminary 
discussion of two-dimensional tasks 
on a rather thin foundation of em- 
pirical findings. Perhaps the dearth 


of analytical data on more complex 
tracking tasks is because of the im- 
plicit view of many psychologists 
that it is desirable to progress in re- 
search from simple to complex sys- 


tems, and that the laws of complex 
systems will tend to fall into place 
once the relationships for simpler 
tasks are established. On the other 
hand, it is possible to defend the 
position that parallel law-seeking at 
two levels of analysis will result in 
two bodies of laws, each appropriate 
for its own domain. As these two 
bodies of knowledge develop, specific 
research can then be directed towards 
finding the empirical composition 
laws which express the interactions 
relating the laws of the two strata. 
If this view is allowed, it cd: not 
seem that the study of 
multidimensional tracking tasks 
should await the codification of laws 
governing one-dimensional tracking. 

To facilitate exposition, the follow- 
ing terminology has been adopted. 
Each stimulus source and its dimen- 
sion of the control system will be 
called a component task of the total 
task. Response in the component 
task will be t2rmed the component 


necessary 
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response of the total response. As 
before, the observing response will 
serve to orient the receptors to the 
events emitted by the stimulus 
sources. 
Visual Tracking 

Observing response. One of the dis- 
tinguishing features of a two-dimen- 
sional visual tracking task is that 
there is not only the need for scan- 
ning within a source but also the more 
demanding requirement to scan be- 
tween sources. This added response 
requirement is importantly a product 
of the task variable called load (Con- 
rad, 1951, 1955). Load is defined as 
the number of stimulus sources and 
has an expected interaction with the 
rate of events emitted from each 
source. This latter variable has been 
termed speed (Conrad, 1951, 1954). 
Performance deteriorates both with 
increase in speed and load. More- 
over, it has been shown that response 
proficiency is a function of the extent 
to which events in spatially distrib- 
uted sources overlap in time and com- 
mand two simultaneous responses 
(J. F. Mackworth & N. H. Mack- 
worth, 1956; N. H. Mackworth & 
J. F. Mackworth, 1956, 1957). An- 
other important task variable which 
would certainly interact with speed 
and load in determining the observing 
response is the amount of the spatial 
separation of the sources. Tracking 
proficiency as a function of the 
amount of spatial separation has not 
been systematically studied. 

Prediction responses. An impor- 
tant but unverified implication for 
prediction responses in a two-dimen- 
sional visual task is that they might 
reduce the major requirement for 
visually scanning the _ stimulus 
sources and improve tracking per- 
formance. Prediction responses in a 
one-dimensional task are known to 
benefit motor performance. We might 
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hypothesize that in a two-dimen- 
sional task there is not only predic- 
tion within each source but also pre- 
diction between-sources. If the hu- 
man operator can learn to predict 
events within a source, it would seem 
that he might learn of the covaria- 
tion between the events of the two 
sources. Given an event in one source 
he would have some likelihood of cor- 
rectly predicting the concurrent 
event in the other source and conse- 
quently would not need to attend 
visually to this source as often. We 
know nothing of these matters but 
between-source prediction is a rea- 
sonable expectation. 

Component response differentiation. 
Two-dimensional tracking often in- 
volves two or more component re- 
sponse effector systems, such as both 
hands or a hand and a foot, and this 
raises the issue of motor interaction 
the two systems. It is a 
common observation that initial 
stages of total response in a multi- 
dimensional task are often typified by 
a level of uncoordinated activity and 
error far greater than might be ex- 
pected from low habit strength in 
each component response separately. 
But, as practice proceeds, these in- 
teractions of component responses 
tend to drop out completely or show 
a marked decrease, with each par- 
ticipating component response effec- 
tor system becoming smoothly pro- 
ficient. This phenomenon shall be 
called component résponse differen- 
tiation. Within the framework of his 
S-R_ contiguity theory, Guthrie 
(1952) discusses the acquired differ- 
entiation of component responses: 


between 


. reduction of habit to essentials makes 
many habits local responses no longer involv- 
ing the whole body. When we are practiced 
we drive and talk, or play the piano and 
smoke, or skate and greet a friend at the same 
time. At first this is impossible because driv- 
ing, playing, skating all include a mass of 
action that is not essential to the performance 


but is present because it is part of total associ- 
ated complex bound together by conditioning. 
In time, many irrelevant movements are 
dropped out from the complex and the activ- 
ity is limited to the muscles and the move- 
ments required for the performance. This 
process is, of course, never complete. Perfect 
grace, which means the use only of the essen- 
tial muscles and this use only to the point 
necessary for the action, is only approximated, 
never reached (p. 109). 


How component response interac- 
tion is manifested in a two-dimen- 
sional tracking task is not known at 
this time. However, the extensive 
literature on experimentally induced 
muscular tension, which has been 
organized by Meyer (1953) in terms 
of physiological hypotheses, leaves 
little doubt that interaction of simul- 
taneous motor responses occurs. The 
concern of Meyer’s review and anal- 
ysis was the effects of experimentally 
induced muscular tension where us- 
ually a static, muscular tension-in- 
ducing component response accom- 
panies a more central learning ac- 
tivity, such as rotary pursuit or 
paired-associates learning. The ma- 
jor area of interest for two-dimen- 
sional tracking, but where much less 
is known, concerns total tasks where 
all component tasks impose a learn- 
ing requirement on their respective 
component responses. Perhaps, as 
Guthrie suggests, the interaction will 
all but disappear. But until a means 
of defining and measuring the course 
of component response differentia- 
tion in tracking is uncovered, there 
is no reason for discussion beyond 
this passing mention of a potentially 
important area, 


Visual-Auditory Bisensory Tracking 


The major issue for two-dimen- 
sional tracking with one visual and 
one auditory source is whether there 
is interaction which intrinsically pres 
vents the two stimulus event stream- 
from being processed simultaneously. 
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While we might intuitively surmise 
that a total task organized in this 
manner will be superior to two-di- 
mensional visual tracking because 
each stimulus stream, with its own 
sense modality, gives the operator a 
higher load carrying capability in 
that he does not have to time-sample 
the sources with the observing re- 
sponse as he does in two-dimensional 
visual tracking, there is no evidence 
of the conditions for which this can 
be true, if at all. As a first experi- 
mental question it would seem de- 
sirable to attack the pure case in 
two-dimensional bisensory tracking 
and ask whether it is possible to 
simultaneously process two stimulus 
streams without impairing interaction 
effects. Any research program should 
have a strategy which sets up a 
hierarchy of research questions whose 
answers are ordered in terms of their 
contribution to the delineation of 
variables and laws, and in bisensory 
tracking the best strategy is sug- 
gested to be one of first determining 
whether the human operator can 
process two event streams at once. 
Having determined the empirical 
truth or falsity of this hypothesis, we 
will be in a better position to com- 
paratively examine the relative mer- 
its of all-visual and bisensory tasks. 
Later variables to consider would be 
the differential capabilities of the 
visual and auditory senses for dif- 
ferent classes of stimulus inputs 
(Henneman & Long, 1954). 
Subjectively we all have the con- 
fident feeling that we can handle 
visual and auditory events simul- 
taneously. It is commonplace to en- 
counter the observation that one 
can simultaneously read a book and 
listen to the radio. Yet, as with most 
anecdotal accounts, they may be true 
but the absence of experimental con- 
trols precludes any proof of the 
thesis. Thus, an explanation of these 
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experiences of everyday life is just as 
plausible in terms of rapid sensory 
shifting from one data stream to 
another. Experimentally, the issue 
is a delicate one and will require 
careful analysis and experimentation 
to decide it conclusively. 

The experimental design necessary 
to prove or disprove that the human 
operator is a one-channel system 
must, as a minimum, show that per- 
formance of each component re- 
sponse in a bisensory tracking task 
will, after practice, be the same as 
performance when each component 
task is practiced out of total task 
context as a separate task. But what 
interpretation can be given if com- 
ponent response measures in bisen- 
sory tracking performance fail to 
achieve the level attained on part 
tasks? The hypothesis that the hu- 
man operator is a single channel data 
processing system is supported but 
the investigator is then faced with 
the new question of the locus of the 
interaction. There are at least four 
possibilities which must then be re- 
solved, although it will take some 
ingenuity and analysis to opera- 
tionally differentiate them for lab- 
oratory testing: 

1. The human operator is truly a 
one-channel system and, when two 
units of stimuli arise simultaneously, 
one must be temporarily stored while 
response occurs to the other. At the 
completion of the first response, the 
second stimulus unit is removed from 
central storage and response is made 
to it. 

2. No storage is required. The 
operator is capable of simultaneously 
processing two event streams but 
there is motor interaction which pre- 
vents the two responses from simul- 
taneously occurring with the same 
effectiveness that would be observed 
for any one of them separately. In 
effect, this hypothesis is consistent 
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with Guthrie's position that there is 
always some interaction between 
simultaneously functioning response 
systems, even after very large 
amounts of practice. Component re- 
sponse differentiation is never com- 
plete. 

3. No storage is required and there 
is no interaction of responses at the 
motor level. However, there is sen- 
sory interaction which results in a 
degradation in performance that 
would be absent if only one stream of 
stimuli were being handled. Evi- 
dence for sensory interaction is pre- 
sented in a number of papers (Child 
& Wendt, 1938; Gilbert, 1941; Gregg 
& Brogden, 1952; Hartman, 1934; 
London, 1954; Ryan, 1940). 

4. Combinations of the 
three possibilities. 

The most relevant 
simultaneous bisensory data process- 
ing for tracking is by Davis (1957). 
While he did not study tracking or 
even strict simultaneity of bisensory 
events, he did study the effects of 
very small time intervals between a 
visual and an aurlitory stimulus and 
the experiment makes a significant 
contribution to the topic. Following 
the generalizations on psychological 
refractory phase, Davis asked 
whether the operator is refractory if 
the second of two successive stimuli 
impinges on a different sense modal- 
ity than the first. Using the reaction 
time response and stimuli of very 
brief duration, Davis found that the 
reaction time to the second signal in- 
creased as the interstimulus interval 
decreased. The data show that the 
phenomenon which has come to be 
known as psychological refractory 
phase operates for two successive 
stimuli in two sense modalities about 
as it does for two successive events 
in a single sense modality. In some 
fashion, a “queuing of signals,” to 
use the engaging phrase of Davis, 


above 
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occurs whether stimuli arrive over 
one or two sense channels. Davis 
finds his data consistent with a model 
of the human operator as a single 
channel information system. If we 
can assume that the processing of 
simultaneous events is a special zero- 
interval case of intervals for succes- 
sive stimuli, the extrapolation of the 
Davis findings to the zero interstimu- 
lus interval suggests a substantial 
impairment in performance. An em- 
pirical study of truly simultaneous 
events must be done but the Davis 
experiment is unquestionably pro- 
vocative on the simultaneity issue. 


SUMMARY AND CONCLUSIONS 

This paper has reviewed some of 
the major issues and problems in the 
study of human tracking behavior. 
Apart from the complexities that are 
inherent in the analysis of closed-loop 
behavior, which is somewhat more 
complicated than the open-loop sit- 


uations used by most psychologists 
in their studies of human behavior, 
tracking behavior is beset with the 
added complications of mediating re- 
sponses and stimuli which are im- 


portant variables intervening be- 
tween the display and the measured 
motor response. Moreover, all of 
these variables assume further com- 
plications when they are cast in the 
matrix of multidimensional tracking 
tasks with two or more stimulus 
sources, each with a corresponding 
dimension of the control system for 
response to them. And, not only do 
multidimensional tasks have com- 
plications resulting from a compound- 
ing of the effects of variables found 
in one-dimensional tracking, but 
they have the added issues of how 
one or more sense modalities process 
the incoming data and how the 
component response systems interact 
throughout learning to become partly 
or completely noninteractive (differ- 
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entiated). We appear to be a long 
way from understanding these fac- 
tors and, until we do, we are a long 
way from the beginnings of any kind 
of theory of tracking. British re- 
search has been most influential in 
illuminating the characteristics of 
tracking behavior, with its experi- 
mental examination of what is 
learned (e.g., prediction behavior in 
tracking), and its study of the inter- 
mittency hypothesis. This approach 
of British investigators would seem 
to be mandatory for our eventual 
theoretical description of tracking, 
and is in some contrast to the ap- 
proach of engineering psychologists 
in the United States who tend to em- 
phasize measures of tracking be- 
havior as a function of tsk variables 
and often bypass detailed analyses of 
the learned behavior Some im- 
portant exceptions to this emphasis 
on the domestic scene has been the 


early work of the Naval Research 
Laboratory, Gottsdanker, and recent 
work by Briggs and his associates on 
learning and transfer as a function of 
task variables. 

If this paper can be said to have a 
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point of view, it is that tracking re- 
search is in need of a rapprochement 
of the interests of the engineering 
psychologist, with his focus on task 
variables and the measurement of 
time-based behavior, and interests of 
the traditional experimental 
chologist who tends to emphasize be- 
havior as a function of variables 
which* determine conceptual states 
such as habit, work inhibition, mo- 
tivation, mediating responses, etc. 
Physical servotheory has been a 
prominent attempt in engineering 
psychology to describe tracking be- 
havior, but the absence of variables 
defining conceptual states long 
known to influence behavior elimi- 
nates it as a psychological theory of 
any stature, quite apart from its for- 
mal shortcomings for the description 
of nonlinear human behavior. It is 
unlikely that a theory of tracking be- 
havior will emerge until these con- 
ceptual variables are included, along 
with time series measurement and 
task variables which traditionally 
have occupied engineering psychol- 
ogy. 


psy- 
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The purposes of this paper are to 
consider some difficulties involved in 
matching problems with multiple 
judges and objects, and to present 
some appropriate techniques for the 
analysis of such data. The matching 
problem is the problem of evaluating 
the accuracy of a set of judgments 
about a series of objects. In the usual 
form of the problem, a judge places 
each object into one of several speci- 
fied and unordered categories. Since 
the number of categories is finite and 
is ordinarily small, each such set of 
judgments has an appreciable prob- 
ability of occurrence by chance. It 
will be obvious that there is always 


an external or a priori criterion for 
scoring each judgment as correct or 


incorrect. As Mosteller and Bush 
indicate (1954, pp. 307-308), the 
*matching problem is present in many 
apparently different designs which 
call for identifying, diagnosing, or 
otherwise classifying objects, persons, 
or responses. Examples are guessing 
the order of a deck of ESP cards, 
diagnosing clinical cases, and identi- 
fying the products of each of several 
designated persons. 

The several questions asked by the 
experimenter may include: Can these 
objects be classified by these judges 
with better-than-chance success? If 
so, are some judges more successful 
than others, and are some objects 
more successfully classified than other 
objects? It will be seen that these 


' IT am indebted to many colleagues for their 
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Sawyer, Charles van Buskirk, and Virgil 
Willis. 


80 


questions are not specific to the match- 
ing problem: e.g., they may also be 
asked in the analysis of responses to 
an aptitude test. What is distinctive 
about the matching problem is that 
the number of categories is usually 
larger than two, that each is of equal 
interest (as contrasted to the usual 
psychometric preoccupation with 
“right’’ answers), and that the num- 
ber of objects classified by each judge 
is usually small. When the judge 
makes judgments about many ob- 
jects in each of the true categories, 
psychometric scoring methods pro- 
vide scores that can be analyzed by 
familiar statistical techniques. 

The matching problem takes sev- 
eral forms, depending primarily on 
the number of categories into which 
the objects fall. The number may be 
from two to O (the number of ob- 
jects). Another variable in such 
designs is the judge’s information 
concerning the distribution of cases 
over categories: e.g., he may have 
some prior knowledge which would 
constrain his judgments, or he may be 
told how many objects fall in each 
category; if the categories are male 
and female, he would put approxi- 
mately 50% of the objects in each 
category, or he might be informed 
that exactly 50% were of each sex. 
For brevity, this paper will be limited 
to the general form of the problem in 
which the instructions do not fix the 
distribution of judgments and in 
which the objects are from a specified 
class. Thus the task may be to judge 
whether or not each object has a cer- 
tain property, or the analysis may be 
limited to evaluating the accuracy of 
judgments about objects in a given 
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category, these having been pre- 
sented along with objects from other 
categories. 


THE ONE-VARIABLE CASE 


Consider the limited case in which 
one judge, j, makes judgments about 
each of a set of O randomly selected 
objects. The experimental question 
is whether his success is significantly 
greater or less than that which would 
be expected from chance matching 
(due to his ability, his biases, sys- 
tematic errors, etc.). Statistical 
techniques have been developed and 
presented by various writers. Mostel- 
ler and Bush (1954) give the most 
comprehensive account. Also perti- 
nent are papers by Chapman (1934, 
1935, 1936), Dudek (1952), McHugh 
and Apostolakos (1959), Roberts 
(1958), and Vernon (1936a, 1936b, 
1936c). Mathematical treatments 
and further references are given by 
Battin (1942), Cochran (1950), Gil- 
bert (1956), Stevens (1938), and 
Wilks (1943, pp. 208-213). 

The conclusions from this design 
obviously apply only to Judge 7 and 
the population from which the O ob- 
jects were drawn. Such a study would 
be of value if positive results per- 
mitted the conclusion that there is at 
least one judge who can correctly 
judge objects of this type. Negative 
results would have no value unless 
there were something distinctive 
about the one judge. 

In parallel fashion, one object, oa, 
might be judged by a set of J judges, 
with the conclusions applying to that 
one object and the population from 
which the judges were drawn. While 
such an experiment would be worth- 
while if the object had special signifi- 
cance, inability to generalize to other 
objects would usually make the study 
of little value. 

The necessity for representative 
designs in studies of this sort has 
been pointed out by various writers 


(Crow, 1954, 1957; Hammond, 1954; 
Secord, 1952). (For a pertinent dis- 
cussion of the problems of interpret- 
ing the results of matching studies, 
see Cronbach, 1948.) 


THE Two-VARIABLE CASE 


The one-variable case can be repl- 
cated with J different judges, each 
being assigned a random sample of 
objects, the several samples being ob- 
tained independently. If a p value is 
obtained for each judge, the findings 
for the several judges can be pooled 
(e.g., through the chi square trans- 
formation—see Jones & Fiske, 1953). 
If the objects are randomly chosen, 
one can make inferences about judg- 
ments of objects in the population 
from which they were drawn. If the 
judges are also randomly drawn, 
inferences can be made to the popula- 
tion from which they came. Other- 
wise, the inferences must be limited 
to the particular objects or the par- 
ticular judges, respectively. This 
design is suitable for testing for non- 
randomness of judgments, but does 
not permit a comparison between ob- 
jects. It is not optimal for testing 
for differences between judges since 
differences between samples of ob- 
jects would contribute to apparent 
differences between judges. 

In another design, the O objects are 
randomly assigned to the J judges 
(J =O), each judge making one judg- 
ment and each object being judged 
once. This appears to be an excellent 
design for testing whether judges of a 
certain kind can judge correctly 
about objects of a specified type: a 
given amount of judge effort is spread 
over the largest possible number of 
objects so that the errors of sampling 
judges and objects would tend to be 
minimized. The resulting data can be 
analyzed by an appropriate statistical 
test from those in the references cited 
above: e.g., a test to determine 
whether the ol 1 proportion of 
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hits departs significantly from that 
expected on the basis of chance. 

Such a design is ordinarily not used 
because it does not permit analysis of 
judge differences, object differences, 
or judge-object interactions. It is 
also not economical of the experi- 
menter’s effort insofar as each judge 
must be instructed or trained and 
each object must be prepared for 
presentation to the judges. 

In a more common design, each 
judge judges each object. The re- 
sulting data can be recorded in a 
bivariate table, the rows indicating 
objects, the columns the judges, and 
the entry indicating success or failure 
(e.g., 1 or 0). 

But the several observations in 
each column may not be independent 
of each other, and neither are those in 
each row—just as, in the restricted 
case, the results apply only to the one 
judge or to the one object. Therefore, 
the set of JO observations cannot be 


treated as independent. To test the 
departure of the grand mean of the 
JO observations from chance expect- 


ancy, an error term is needed. The 
variance of the JO distribution is un- 
satisfactory because its actual de- 
grees of freedom are not known. This 
dependence among observations is the 
crucial problem in most matching 
experiments. 

This critical problem has been seri- 
ously slighted or overlooked in previ- 
ous work. Mosteller and Bush (1954, 
p. 311) combine results for several 
judges on one set of objects without 
reference to the consequent restric- 
tion on the conclusion. Vernon 
(1936c) pointed out that significant 
differences between judges or between 
objects would introduce a marked 
bias in his method for analyzing 
matching data. He therefore pro- 
posed that, where such differences 
are known or suspected, the data for 
the average judge (or average object) 
be the basis for inference. This sug- 
gestion would seem to involve throw- 


ing out most of the available informa- 
tion and to increase the danger of 
making a Type I error. For example, 
suppose that each judge does better 
than chance but the data for the 
average judge does not reach the 
selected level of significance; the re- 
sults for the several judges taken to- 
gether might still attain significance. 

When each judge judges each ob- 
ject only once, there is no satisfactory 
direct test of the observations as a 
totality. If the nature of the material 
permits each judge to make a number 
of judgments about each object, each 
judgment being truly independent of 
all previous judgments for that ob- 
ject, his score for that object can be 
stated as a proportion and an overall 
test could be developed on the basis 
of the discrepancies between these 
proportions and those predicted from. 
the proportions for the given row and 
column. It is ordinarily not possible 
to obtain such a series of independent 
judgments. 

However, indirect tests using the 
judge or object means can be made. 
The experimenter can test whether 
the overall proportions for the several 
objects (each proportion being the 
mean success per object) differ from 
chance. He can also make a test of 
the set of mean successes per judge. 
These will be considered below. It 
should be noted here that while these 
two tests are of the same grand mean, 
they may lead to different conclusions 
because the variance of the judge 
means is ordinarily different from 
that of the object means. 

There is a special case of this mul- 
tiple judge and multiple object de- 
sign in which one assumption of ran- 
dom sampling is not made: examples 
are studies of particular judges who 
are distinguished by their obvious 
expertness, or studies of a finite and 
small class of objects of special inter- 
est. Here the experimental question 
concerns the performance of these 
judges or judgability of these objects. 
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Their role in the design is analogous 
to that of the fixed constants in one 
model for analysis of variance, or to 
the operations or instrument for 
measurement in research in general. 
The conclusions are limited to the 
instrument since .o random sampling 
is involved in the selection. 

In such designs, the appropriate 
procedure is to obtain a score for each 
application of the instrument, and to 
test the mean of these scores. For 
example, one might determine 
whether a group of experts can make 
a blind diagnosis of each of a random 
set of cases, the score being the num- 
ber of experts who correctly diag- 
nosed each case. The mean number 
right would be compared with the 
mean expected on a chance basis (as 
derived from the actual distribution 
of judgments over categories), the 
error term being based on the distribu- 
tion of scores for the several cases. 

An alternative technique is to de- 
termine a ~p value for each case, and 
then combine the set of such values 
for the whole set of cases. The ex- 
pected proportion is the proportion 
of all judgments which fall in the 
category to which the case belongs. 
Given this value, the probability of 
obtaining or exceeding the observed 
number of successful judgments for 
the case can be found in tables for 
the binomial distribution. The p 
values for the several cases can be 
pooled by the chi square transforma- 
tion of the several p values (Jones & 
Fiske, 1953). This method assumes 
that the degree of success of the 
judges on one case does not affect 
the degree of success on any other 
case. 

A refined method for evaluating 
the performance of each judge has 
been suggested to the author by Lee J. 
Cronbach in a personal communica- 
tion. The set of O judgments made 
by each judge can be scored in the 
usual way by counting the number of 
hits or correspondences with the 


criterion classifications or identifica- 
tions. Then this same set of judg- 
ments by this j, treated as a distribu- 
tion, is randomly assigned to the sev- 
eral o’s, and the number of hits noted. 
This process is repeated for a number 
of trials sufficiently large to provide 
a sampling distribution from which 
one can estimate the chance proba- 
bility of obtaining the actual number 
of hits earned by this 7. This method 
takes into account the judge’s biases 
or preferences for certain categories. 

For an oversimplified illustration, 
suppose that four o’s have the actual 
classification of A, B, B, C, and sup- 
pose j judges them to be A, C, B, A, 
respectively, giving him two hits. 
Since the number of o’s is small, one 
can in this case determine the exact 
probability of two or more hits by 
comparing with the criterion order 
(A, B, B, C) each of the 12 possible 
orders of two A’s, one B, and one C. 
We find that 4 of the 12 yield two or 
more hits and hence the probability 
of this judge making two or more hits 
is .33. This probability is lower than 
the expected probability based on 
the assumption that the probability 
is 4 of a hit on each o. 

The same method could be applied 
to each object, with the resulting 
values of p for the O objects being 
evaluated as a set. This would ordi- 
narily be less appropriate than the 
approach based on the judges, be- 
cause it is known that individual 
judges have response sets and other 
biases which should be taken into ac- 
count. The extent to which judg- 
ments about a particular o are biased 
will probably be of smaller magnitude 
and will typically be of lesser interest. 
The selection of approach should be 
based primarily on one’s objective: 
if the objects are viewed as the instru- 
ment tested, » values are obtained 
for judges and an inference is made 
about the sampled population of 
judges. 

This method has the same type of 
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rationale as that employed by Crow 
(1954) to evaluate judges’ predictions 
of the responses of each of a set of 


patients. In this Random Compari- 
son Method of Chance Determina- 
tion, the distribution of predictions 
for all judges on all objects was com- 
pared to the actual behavior of each 
object and the resulting sets of dis- 
crepancy scores were pooled. It was 
then possible to determine for how 
many of the objects the given judge 
had done better than the median for 
all possible judge-object pairings. 
(For a study comparing correspond- 
ence between two records from each 
case with correspondence between 
paired records from different cases, 
see Kelly & Fiske, 1951, pp. 135- 
140.) 

Up to this point, only designs for 
testing the obtained mean success 
have been considered. It should be 
noted that such means may be signifi- 
cantly below (as well as above) the 
value expected from chance matching 
due to systematic errors in judgment. 
A complete analysis of such data 
would also test whether the variance 
of the obtained scores was signifi- 
cantly different from that expected 
with chance matching, since signifi- 
cant differences between judges or 
between objects, or significant judge- 
object interaction may be present, 
even when the mean is not signifi- 
cantly different from chance. 

It must be emphasized again that 
in all matching studies involving 
multiple judgments about a set of 
objects, the several observations can- 
not be treated as independent and 
the experimental design must take 
explicit cognizance of this restricting 
condition 


Testing for Differences between Judges 
or between Objects 

The question of differences between 
the performances of judges or differ- 
ences between objects in the ease with 
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which they are correctly judged is 
often of interest, even when the non- 
randomness of the judgments has not 
been tested. If nonrandomness has 
been tested, the question of individ- 
ual differences may be of interest 
even when the mean does not depart 
significantly from chance. 

Such testing for differences would 
be straightforward if the observations 
were a continuous variable: e.g., if 
each object could be judged at several 
different times by each judge. When 
the data are recorded as discrete 
observations (1 or 0), techniques such 
as conventional analysis of variance 
are not suitable. 

One can, however, use the analysis 
of variance of ranks or W, the co- 
efficient of concordance (Kendall, 
1948). To test for differences between 
objects, we would rank the O values 
for each judge—i.e., we would assign 
the appropriate (tied) rank to all 
successes and similarly with the fail- 
ures, making the appropriate adjust- 
ment for the ties. As in all designs 
where the objects are judged by all 
judges, it would be necessary to have 
the order of judgment randomized 
to control for effects of position on 
success. 

Also appropriate is the chi square 
test developed by Cochran (1950), 
which is available in Siegel (1956). 
This is a generalization of the test for 
two related groups that has been 
offered by McNemar (1955, pp. 228- 
231). Empirical examples indicate 
that, when the observations take the 
values 1 or 0, Cochran’s statistic, Q, 
gives essentially identical results with 
the chi square for ranked data that is 
related to the coefficient of concord- 
ance, provided the correction for ties 
is made. 

One caution should be stated. As 
Cochran (1950) points out for his 
statistic, rows with identical entries 
(all successes or all failures) have no 
effect on the Q for columns. Q is 
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equivalent to the chi square from 
ranks only when the computation of 
the latter statistic omits such rows. 
This omission of arrays without 
variance, comparable to the exclusion 
of ties in the sign test, may influence 
the experimenter’s interpretation 
when a substantial proportion of the 
rows is involved. 

The row means are taken as fixed 
in Cochran's test, and it would appear 
that the same holds for the chi square 
from ranks. For example, a different 
set of row means might involve a 
greater or smaller number of rows 
with identical values. In general, 
variances of row and column means 
tend to be negatively related: if the 
effect for objects is large, it may pre- 
clude the possibility of a significant 
effect for judges. Therefore, the in- 
ference from such data must be re- 
stricted to populations with distribu- 
tions of row means similar to that for 
the data at hand. 

When the experiment involves re- 
peated judgments by each judge 
about each object, or when each judge 
judges several objects of each of sev- 
eral kinds, the entries can be propor- 
tions rather than 1 or 0. Under these 
conditions, it is possible to control 
column effects while testing for row 
effects. Mood (1950, pp. 399-402) 
presents two methods, an exact test 
for small Ns and a test based on chi 
square. They require the assumption 
that the interactions are zero, unless 
either the judges or the object classes 
have been randomly chosen. 

THE MULTIPLE CATEGORY CASE 

The preceding presentation 
neglected the categories to which the 
objects belong. In most designs, there 
are several categories with several 
objects in each. The same principles 
of design and inference apply to this 
case. In a design where the same 
judge or judges are used for all ob- 
jects, the conclusions are limited to 
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the judge(s) as an instrument. The 
accuracy of judgments should be 
evaluated for each category sepa- 
rately. This can be done by obtaining 
a score for each case: e.g., the proba- 
bility of m judges being correct with 
the chance value being determined 
from the relative frequency of the 
category in the obtained judgments. 
When the relative accuracy for one 
case can be assumed to have no effect 
on the accuracy for any other case, 
these p values can be pooled. 

The multiple category design has 
one type of dependence not found in 
the simpler designs: the relative ac- 
curacy for one category may affect 
that for other categories. In the ex- 
treme case of two categories, A and 
B, the experimenter cannot distin- 
guish between success in judging 
cases as A and success in judging 
cases as not-B, and conversely for B 
and not-A; hence no inference is 
possible concerning the relative suc- 
cess for the two categories. 

Whenever the number of categories 
is small, some interdependence will 
be present and any tests of differen- 
tial success for the several categories 
will be biased in favor of accepting 
the hypothesis of no difference unless 
this dependence is taken into account. 
However, an approximate test might 
be a one-way analysis of variance 
where each entry represented the 
difference, for a single case, between 
the obtained proportion and the ex- 
pected proportion of correct judg- 
ments. (An index of this variety, 
suggested by David Wallace, was 
used in a complex matching study by 
Henry & Farley, 1959.) 


SUMMARY 


This paper considers some prob- 
lems of design and inference that are 
found in studies using the matching 
method with multiple judges and ob- 
jects. When each object is judged by 
each judge, the analysis must take 
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into account the dependence among 
the several observations in each set. 
Failure to recognize and allow for this 
dependence is a common oversight 
in the design of matching studies. 
Frequently the judges or the objects 
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must be viewed as an instrument, the 
conclusion being restricted to ‘‘these 
. ” “a > ” -) 
judges” or “‘these objects. Tech- 
niques are considered for testing for 
differences between judges, or differ- 
ences between objects. 
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COMMENT ON “A NOTE ON PROJECTION” 


BERNARD I. MURSTEIN! 
Interfaith Counseling Center, Portland, Oregon 


In a recent issue of the Psycho- 
logical Bulletin Chase (1960) dis- 
cussed an article by Murstein and 
Pryer (1959). Referring to this 
article Chase stated that there appear 
to be “‘rather glaring faults in formu- 
lation, categorization and definition’ 
(p. 289). It would seem, however, 
that Chase misses the mark in two 
important respects: 

1. There is a gross confusion in 
differentiating what Pryer and I say 
about projection in the critique por- 
tion of our paper from what we report 
others as saying in our review of the 
research. 

2. There are several misinterpreta- 
tions of psychological terms in Chase’s 
criticism. 


We defined “attributive” projec- 
tion as ‘The ascribing of one’s own 
motivations, feelings and behavior to 


other persons” (Murstein & Pryer, 
1959, p. 354). Chase (1960), finding 
this definition similar to the one listed 
by English and English (1958), de- 
scribes our definition as “clearly re- 
dundant” (p. 289). It should be 
noted, however, that the purpose of a 
review is not to invent new defini- 
tions, but to classify and order the 
myriad research publications bearing 
on the topic under consideration. 
The fact that our definition overlaps 
with one given by English and Eng- 


1 T should like to express my appreciation to 
E. Nelson Pareis and Martha Pareis for 
reading the manuscript and offering their 
valuable comments. 

2 As senior author, I assume full responsi- 
bility for all “glaring faults in formulation, 
categorization and definition” attributed to 
our paper by Chase and accordingly am re- 
plying to his comments. 


lish indicates only that we have suc- 
ceeded in identifying one of the com- 
mon usages of the term. 

We are further belabored by Chase 
for our putative disregard of the 
“unconscious” and the “‘self-concept”’ 
in our discussion of “attributive”’ 
projection. Again, there is a failure to 
understand that we did not advocate 
this omission but only reported a 
condition familiar to most persons 
conversant with the literature on 
projection—namely, that most opera- 
tional usages of the term imply noth- 
ing about an unconscious or a self- 
concept. A typical operational defini- 
tion is described by Bender and 
Hastorf (1953). A statement “I am 
wary about the trustworthiness of 
persons whom I do not know well” 
may be answered affirmatively by a 
subject (S$). If he now predicts that 
his friend would answer the item 
similarly we have an example of 
attributive projection. The con- 
gruency of answers, however, may be 
based on the similarity of the per- 
sonalities answering habits, experi- 
ences, or cognitive evaluations of the 
two Ss without implying any notions 
about the unconscious or self-con- 
cept. It is exactly this lack of con- 
cern with the self-concept which led 
Pryer and me to criticize this ap- 
proach! 

Chase criticizes our use of termi- 
nology. Only the most meaningful of 
these comments will be considered. 

1. We quoted Zilboorg’s example 
(1935) of medieval projection which 
Chase regards as an example of an 
hallucination rather than of projec- 
tion. He failed to realize that an 
hallucination is but one kind of pro- 
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jection. Thus, Murray (1933) says: 
We may speak of perceptive projection when 
sensory elements are projected, i.e., when an 
image takes on the vividness, substantiality, 
and out-thereness of a real object—as in... 


an hallucination (p. 313). 


2. Chase believes that our term 
“rationalized projection” is simply 
another name for rationalization. 
Rationalized projection refers to an 
occasion in which S while not deny- 
ing the possession of an unattractive 
trait or the fact that he committed 
some unsavory act does deny re- 
sponsibility by projecting the cause 
of his behavior on to another. Ra- 
tionalization is defined by English 
and English (1958) as 
the process of giving rational order or inter- 
pretation to what was previously merely a 


vague intuition, or was chaotic and confused 
(p. 438). 


Since rationalization may serve to 
make an event comprehensible with- 
out necessarily involving recourse to 
self-deception, it is apparent that 
rationalized projection is but one of 
many kinds of rationalization. 

3. A similar confusion of species 
with genus underlies Chase’s attempt 
to equate “autistic projection’”’ with 
autism. The former is a term used to 
describe misperceptions due to hun- 
ger, thirst, or a “‘set’’ of some kind. 
The latter is defined by English and 
English (1958) in one of their defini- 
tions as 
finding pleasure in fantasies that represent 
reality in wish-fulfilling terms, even when 
these are not believed (p. 54). 
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It is evident that autistic projection 
is one form of autistic expression. 

4. In an attempt to encompass the 
extremely diverse and varied uses of 
projection, Pryer and I settled for a 
broad definition of it which included 
emotional value or need. Chase inter- 
prets this as subsuming “‘defecation.” 
It is, however, unusual to regard this 
act as involving an emotional value 
for the majority of persons, Freud 
notwithstanding. 

Finally, a new 
types of projection 
Chase (1959). 

Two major categoriesare immediately obvious 


We might term one type defensive projection 
and the other predictive projection (p. 289). 


of 


by 


classification 
is offered 


I should like to illustrate via a brief 
example why this division is not satis- 
factory. On the eve of election day in 
1948, a noted commentator, H. V. 
Kaltenborn, predicted a Dewey vic- 
tory. Far into the night, as the stun- 
ning reversal of expectation became 
manifest, Kaltenborn relied on his 
long experience to avoid being swayed 
by “early city returns” which favored 
Truman. Surely this is a bona fide 
case of predictive projection, but, 
does it not also smack of defensive 
projection? 

Though I question the validity of 
Chase’s criticisms there is no doubt 
that a good deal of work still remains 
in the area of projection. Hopefully, 
we will yet evolve an operational 
definition retaining the historical 
meaning, which can also be experi- 
mentally validated. 
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